[gem5-users] Re: RENAME: HELP Needed for Running Benchmarks in GPU Full System Simulation

2024-01-02 Thread Matt Sinclair via gem5-users
Just to add to this: to the best of my knowledge online compilation in
OpenCL is not supported in gem5 outside of KVM (which does that compilation
on the real CPU).  I don't think it just increases simulation time -- I
think it just throws an error.

Matt

On Tue, Jan 2, 2024 at 1:37 PM Poremba, Matthew via gem5-users <
gem5-users@gem5.org> wrote:

> [Public]
>
> Hi Sandy,
>
>
>
>
>
> Depending on the benchmark, OpenCL might do an online compile (i.e.,
> compile the kernels right before running them).  If you are using KVM it
> should just work.  Otherwise, the online compilation will take a
> significant amount of simulation time and offline compiling would be
> preferred (i.e., compile the kernels offline on the disk image).
>
>
>
>
>
> -Matt
>
>
>
> *From:* 关富润 via gem5-users 
> *Sent:* Saturday, December 23, 2023 7:44 PM
> *To:* gem5-users 
> *Cc:* 关富润 <448367...@qq.com>
> *Subject:* [gem5-users] RENAME: HELP Needed for Running Benchmarks in GPU
> Full System Simulation
>
>
>
> *Caution:* This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
>
> Dear all,
>
> I am currently engaged in GPU full system simulation using gem5 and am at
> a stage where I seek to run benchmark suites, specifically Rodinia and
> PolyBench. I have observed that both of these benchmark suites support
> compilation for OpenCL architecture, and I have a few questions regarding
> the compilation and execution process within the gem5 environment. Compilation
> Framework: Should I compile these benchmarks under the OpenCL framework?
> Is this the recommended approach for compatibility with the gem5 GPU full
> system simulation?
>
>
>
> Utilizing ROCm in gem5: I noticed that the ROCm installed in the gpu-fs
> docker image includes OpenCL compilation tools. Is it feasible to directly
> compile these benchmarks within this environment? Are there any specific
> considerations or steps that I should be aware of? Guidance and
> Documentation: Would anyone be able to provide guidance or point me
> towards documentation on how to properly set up and execute these
> benchmarks in the gem5 GPU full system simulation context?
>
>
>
> Thank you in advance for your time and assistance. I look forward to any
> suggestions or guidance you can offer.
>
>
>
> Best regards,
>
> Sandy.
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Fail to run gpu-fs

2023-12-19 Thread Matt Sinclair via gem5-users
Hi Sandy,

Can you please give us a bit more information about what you were running?
It looks like you were just trying to run square from the README?  Normally
that works out of the box, so I'm wondering if you made any changes to your
local setup.

(I am not the primary developer for GPUFS, but am trying to help)

Thanks,
Matt

On Tue, Dec 19, 2023 at 5:16 AM 关富润 via gem5-users 
wrote:

> Dear all,
> I've encountered while performing a gpu-fs simulation using the gem5
> simulator. Following the instructions outlined in the
> https://github.com/gem5/gem5-resources/blob/stable/src/gpu-fs/README.md,
> and using the disk image obtained from
> https://www.gem5.org/2023/02/13/moving-to-full-system-gpu.html, I
> executed the following command:
> build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image
> ../gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel
> ../gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace
> ../gem5-resources/src/gpu-fs/vega_mmio.log --app
> ../gem5-resources/src/gpu/square/bin/square During the execution, I
> encountered multiple warning messages related to unsupported MSR (Model
> Specific Register) accesses, followed by a panic related to the Intel 8254
> timer. The specific warning and error messages were: 
> build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562:
> warn: kvm-x86: MSR (0xc001011f) unsupported by gem5. Skipping.
> build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x1fc)
> unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562:
> warn: kvm-x86: MSR (0x8b) unsupported by gem5. Skipping.
> build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR
> (0xc0010015) unsupported by gem5. Skipping.
> build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR
> (0x4b564d05) unsupported by gem5. Skipping.
> build/VEGA_X86/dev/x86/pc.cc:117: warn: Don't know what interrupt to clear
> for console. build/VEGA_X86/dev/intel_8254_timer.cc:215: panic: PIT mode
> 0x4 is not implemented: Memory Usage: 23051064 KBytes Program aborted at
> tick 2058120564000 --- BEGIN LIBC BACKTRACE ---
> build/VEGA_X86/gem5.opt(+0x12471b0)[0x5576147331b0]
> build/VEGA_X86/gem5.opt(+0x126b9be)[0x5576147579be]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f634f500420]
> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f634e6a700b]
> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f634e686859]
> build/VEGA_X86/gem5.opt(+0x4ec3e5)[0x5576139d83e5]
> build/VEGA_X86/gem5.opt(+0x1b1d38f)[0x55761500938f]
> build/VEGA_X86/gem5.opt(+0x1cca5fa)[0x5576151b65fa]
> build/VEGA_X86/gem5.opt(+0x1b03226)[0x557614fef226]
> build/VEGA_X86/gem5.opt(+0x77b598)[0x557613c67598]
> build/VEGA_X86/gem5.opt(+0x96b627)[0x557613e57627]
> build/VEGA_X86/gem5.opt(+0xfcc34b)[0x5576144b834b]
> build/VEGA_X86/gem5.opt(+0x19d159d)[0x557614ebd59d]
> build/VEGA_X86/gem5.opt(+0xfcccb3)[0x5576144b8cb3]
> build/VEGA_X86/gem5.opt(+0xfcb8b1)[0x5576144b78b1]
> build/VEGA_X86/gem5.opt(+0x125aa22)[0x557614746a22]
> build/VEGA_X86/gem5.opt(+0x1283534)[0x55761476f534]
> build/VEGA_X86/gem5.opt(+0x1283b13)[0x55761476fb13]
> build/VEGA_X86/gem5.opt(+0x665ab2)[0x557613b51ab2]
> build/VEGA_X86/gem5.opt(+0x4ba777)[0x5576139a6777]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f634f7b9748]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f634f58ef48]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f634f7b9124]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f634f59106b]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f634f6dc1d2]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f634f6dc5bf]
> --- END LIBC BACKTRACE --- Aborted (core dumped) Additionally, in the
> m5out/system.pc.com_1.device file, I found multiple error entries related
> to unchecked MSR access errors. [ 0.334614] unchecked MSR access error:
> RDMSR from 0x1b0 at rIP: 0x8107688a (native_read_msr+0xa/0x30) [
> 0.337428] Call Trace: [ 0.338158] ? __switch_to_asm+0x34/0x70 [ 0.338535]
> intel_epb_restore+0x1f/0x80 [ 0.339670] intel_epb_online+0x17/0x40 [
> 0.340786] cpuhp_invoke_callback+0x8a/0x580 [ 0.342045] ?
> __schedule+0x29a/0x720 [ 0.342531] cpuhp_thread_fun+0xb8/0x120 [ 0.343683]
> smpboot_thread_fn+0xfc/0x170 [ 0.344851] kthread+0x121/0x140 [ 0.345784] ?
> sort_range+0x30/0x30 [ 0.346531] ? kthread_park+0x90/0x90 [ 0.347606]
> ret_from_fork+0x22/0x40 [ 0.348640] 

[gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py)

2023-10-19 Thread Matt Sinclair via gem5-users
Hi Anoop,

1.  gfx902 warning: this is "intentionally" there on the ROCm compiler
folks side.  Essentially, they are trying to warn you that APUs are not
100% optimized for in ROCm.  In particular, I believe libraries like MIOpen
do not have APU support.  But as long as your code does not use libraries
like this, I think you should be fine.

2.  The target not being found in gem5 is because you need to pass in
--gfx-version=gfx902 on the command line:
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/configs/example/apu_se.py#366
(or in your board if you are using a board).  (this assumes you have
already updated your Makefile to compile for gfx902)  Essentially the
problem here is that without specifying this, gem5 thinks you are running a
different version of ROCm, and there is a mismatch.

Hope this helps,
Matt

On Thu, Oct 19, 2023 at 4:08 AM Anoop Mysore  wrote:

> Thank you both!
> I was able to manually copy over the instruction execution support, and it
> works. But there are more changes in Vega that might be useful to getting
> some of the CHAI benchmarks running -- so I would like to move to gfx902 as
> suggested.
>
> However, when I try to compile for gfx902 with hipcc from ROCm 4.0.1, it
> throws "Warning: The specified HIP target: gfx902 is unknown. Correct
> compilation is not guaranteed."
> And understandably, in simulation there's an error that complains that the
> right kernel for the device is not found. I'm assuming I would need to
> update the ROCm stack (for a newer hipcc) -- but I wasn't able to find an
> architecture-support list to figure out which version to install. The
> latest, v5.1, fails due to bad IOCTLs, so is there perhaps an intermediate
> version that works? Or have I got this all wrong somehow?
>
> On Mon, Sep 11, 2023 at 9:15 PM Matt Sinclair <
> mattdsinclair.w...@gmail.com> wrote:
>
>> Yeah, I haven't tried CHAI but I believe gfx902 would work with it (if
>> you need APUs).
>>
>> Matt S.
>>
>> On Mon, Sep 11, 2023 at 12:56 PM Poremba, Matthew <
>> matthew.pore...@amd.com> wrote:
>>
>>> [Public]
>>>
>>> Hi Anoop,
>>>
>>>
>>>
>>>
>>>
>>> That instruction was recently added to gem5, but for Vega ISA only:
>>> https://gem5-review.googlesource.com/c/public/gem5/+/67072 .  It could
>>> be ported to GCN3 probably by copying the code exactly into the
>>> corresponding GCN3 files.  You’ll notice however in that relation chain
>>> there are many more instructions implemented for Vega only, so there will
>>> be similar issues to this.  Alternately, I think there is a Vega APU
>>> working (gfx902?).  MattS would know more about the status of that.   I am
>>> not sure of your use case but if you can use a dGPU, Vega with gfx900
>>> version or full system mode is another option to use Vega ISA.
>>>
>>>
>>>
>>> For the docker automatically quitting, you will have to do `docker run
>>> *-it* …` to start an interactive session.
>>>
>>>
>>>
>>>
>>>
>>> -Matt
>>>
>>>
>>>
>>> *From:* Anoop Mysore 
>>> *Sent:* Monday, September 11, 2023 10:33 AM
>>> *To:* Poremba, Matthew 
>>> *Cc:* Matt Sinclair ; The gem5 Users
>>> mailing list 
>>> *Subject:* Re: [gem5-users] Re: Error in an application running on gem5
>>> GCN3 (with apu_se.py)
>>>
>>>
>>>
>>> *Caution:* This message originated from an External Source. Use proper
>>> caution when opening attachments, clicking links, or responding.
>>>
>>>
>>>
>>> Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for
>>> s_sendmsg.
>>>
>>> However, the ds_add_u32 instruction is still an issue. I am already
>>> compiling with -O1 like so:
>>>
>>> /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803
>>>
>>> main.cpp kernel.cu kernel.cpp
>>>
>>> -o ./bin/hsto.gem5
>>>
>>>
>>> -I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include
>>>
>>> -lz -lm -lc -lpthread -O1
>>>
>>>
>>> -L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out
>>> -lm5
>>>
>>>
>>>
>>> The exact error is:
>>> src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction:
>>> ds_add_u32 v7, v8 is of unknown type
>>>
>>>
>>>
>>> The corresponding line in the simulator
>>> ,
>>> and decoder section of it
>>> .
>>> Because of the involvement of the LDS/GDS, I'm unsure how to implement this
>>> -- any help would be appreciated.
>>>
>>>
>>>
>>> Also, GDB still doesn't seem to be working with my gem5. And without
>>> prints in the kernel, it's cumbersome to get any useful insight on failing
>>> programs.
>>>
>>> I added within the Dockerfile: RUN apt install -y gdb
>>>
>>> I am invoking gdb with:
>>>
>>> docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb
>>> --args gem5/build/GCN3_X86/gem5.debug 

[gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py)

2023-09-11 Thread Matt Sinclair via gem5-users
Yeah, I haven't tried CHAI but I believe gfx902 would work with it (if you
need APUs).

Matt S.

On Mon, Sep 11, 2023 at 12:56 PM Poremba, Matthew 
wrote:

> [Public]
>
> Hi Anoop,
>
>
>
>
>
> That instruction was recently added to gem5, but for Vega ISA only:
> https://gem5-review.googlesource.com/c/public/gem5/+/67072 .  It could be
> ported to GCN3 probably by copying the code exactly into the corresponding
> GCN3 files.  You’ll notice however in that relation chain there are many
> more instructions implemented for Vega only, so there will be similar
> issues to this.  Alternately, I think there is a Vega APU working
> (gfx902?).  MattS would know more about the status of that.   I am not sure
> of your use case but if you can use a dGPU, Vega with gfx900 version or
> full system mode is another option to use Vega ISA.
>
>
>
> For the docker automatically quitting, you will have to do `docker run
> *-it* …` to start an interactive session.
>
>
>
>
>
> -Matt
>
>
>
> *From:* Anoop Mysore 
> *Sent:* Monday, September 11, 2023 10:33 AM
> *To:* Poremba, Matthew 
> *Cc:* Matt Sinclair ; The gem5 Users
> mailing list 
> *Subject:* Re: [gem5-users] Re: Error in an application running on gem5
> GCN3 (with apu_se.py)
>
>
>
> *Caution:* This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
>
> Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for
> s_sendmsg.
>
> However, the ds_add_u32 instruction is still an issue. I am already
> compiling with -O1 like so:
>
> /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803
>
> main.cpp kernel.cu kernel.cpp
>
> -o ./bin/hsto.gem5
>
>
> -I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include
>
> -lz -lm -lc -lpthread -O1
>
>
> -L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out
> -lm5
>
>
>
> The exact error is:
> src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction:
> ds_add_u32 v7, v8 is of unknown type
>
>
>
> The corresponding line in the simulator
> ,
> and decoder section of it
> .
> Because of the involvement of the LDS/GDS, I'm unsure how to implement this
> -- any help would be appreciated.
>
>
>
> Also, GDB still doesn't seem to be working with my gem5. And without
> prints in the kernel, it's cumbersome to get any useful insight on failing
> programs.
>
> I added within the Dockerfile: RUN apt install -y gdb
>
> I am invoking gdb with:
>
> docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb
> --args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py
> --cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby
> --mem-type=SimpleMemory -c
> gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5
>
>
>
> Log:
>
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from gem5/build/GCN3_X86/gem5.debug...
> (gdb) quit
>
>
>
> PS: `quit` was automatically taken in.
>
> Is there anything wrong I'm doing here?
>
>
>
>
>
>
>
> On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew 
> wrote:
>
> [Public]
>
>
>
> Hi Anoop,
>
>
>
>
>
> Based on that register count, I am going to guess you built the
> application with -O0 or some other debugging flags?  If you do this, the
> compiler makes some super large number of registers. I assume that is so a
> real GPU will not run any other applications simultaneously.
>
>
>
> Similarly, if you are seeing s_sendmsg I am going to guess there is a
> printf() in your GPU kernel.  These aren’t currently supported in gem5, but
> something that would be very nice to have.
>
>
>
> If these are true you will need to remove any printfs and compile with at
> least -O1 to run in gem5.
>
>
>
>
>
> -Matt
>
>
>
> *From:* Anoop Mysore 
> *Sent:* Friday, September 8, 2023 7:33 AM
> *To:* Matt Sinclair 
> *Cc:* The gem5 Users mailing list ; Poremba, Matthew
> 
> *Subject:* Re: [gem5-users] Re: Error in an application running on gem5
> GCN3 (with apu_se.py)
>
>
>
> *Caution:* 

[gem5-users] Re: Gem5 GCN3_X86

2023-08-23 Thread Matt Sinclair via gem5-users
Hi Kazi,

Trying to answer your questions:

1.  I am not aware of -d not working -- as of yesterday my students and I
were able to use it (with head of develop, or something close to it).  How
are you attempting to use it on the command line?

2.  I am not sure about the -mem-type flag (maybe Matt P., CC'd, knows
better), but in an APU model like you seem to be looking at, the CPU and
GPU memory are one and the same.  So there isn't a different CPU and GPU
memory if you are using an APU.  Matt P might have better information on
how to do this for a dGPU model, which we support in the VEGA_X86 model.

3.  Right now, the GPU support in the public gem5 and its VIPER Ruby
coherence protocol only allow the protocols available in Ruby (LRU,
TreePLRU) to be picked.  I believe there is also not a command line flag to
chose them, so you'd need to edit the appropriate Python file (e.g.,
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/configs/ruby/GPU_VIPER.py#235)
to pick the policy you want.  However, some of the students working with me
has updated Ruby and Classic's replacement policy support such that you can
now pick all the other Classic replacement policies in Ruby protocols too
(originally started here:
https://gem5-review.googlesource.com/c/public/gem5/+/20879, and subsequent
added to here:
https://gem5-review.googlesource.com/q/owner:ji...@wisc.edu.test-google-a.com).
We have some patches internally that allow users to specify replacement
policies for the different GPU caching levels (as part of the work
described here:
https://www.gem5.org/assets/files/workshop-isca-2023/slides/analyzing-the-benefits-of-more-complex-cache.pdf),
but we have not pushed them yet to the public code as we're trying to debug
the issues highlighted in slides 19-23 of that presentation.  We can push
the patches publicly now if that would help, but there would be some caveat
emptor there since the source of those bugs is unknown.

4.  I am assuming you are referring to the parameters eventually used in
places like this:
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/configs/ruby/GPU_VIPER.py#142?
If so, tcp_size is the size per instance of the L1D$ (currently per CU in
the GPU implementation) and tcc_size is the size of the shared GPU L2.  I
am not sure if you are seeing some other parameters somewhere?

5.  Matt P might have better information on this, but from briefly looking
at the stats output, my guess is the waveLevelParallelism stats are the
ones that might provide this information.

Hope this helps,
Matt

On Wed, Aug 23, 2023 at 11:19 AM Kazi Asifuzzaman via gem5-users <
gem5-users@gem5.org> wrote:

> Hello,
>
> I am exploring the usage of the Gem5-GPU model GCN3_X86 (version
> 22.1.0.0). I have the following queries/questions if you could kindly
> clarify:
>
> 1. In previous versions of Gem5 we could use --outdir or -d option to
> redirect the output to a specific directory. That appears to be not
> supported in the version specified above (please correct me if I am wrong).
> Is there any other way to redirect simulation outputs to specific
> directories other than m5out? Otherwise, is there any other way to run
> multiple instances of gem5, ensuring that the stats.txt of one application
> is not overwritten by other instances running at the same time and saving
> the output in the same directory (m5out).
>
> 2. For GCN3_X86, I assume -mem-type defines available memory models for
> CPU/Host memory. By the default settings, does this work as a "Unified
> Memory" ? If not, how to define the memory type for the GPU (global
> memory), if it is different from CPU main memory?
>
> 3. --list-rp-types lists the available replacement policies, but what is
> the option to select one of those?
>
> 4. -n defines the number of CPUs or cores, L1d_size and L2_size should be
> the size per core or per CPU?
>
> 5. Are there any output parameters that report % of resources (e.g. CUs)
> used by the application, or quantify memory contention in unified memory?
>
> Thanks,
>
> *K. Zaman*
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py)

2023-08-17 Thread Matt Sinclair via gem5-users
Hi Anoop,

I'm glad that increasing -n helped.  It's hard to say what exactly the
problem is without digging in further, but often the ROCm stack will launch
additional processes to do a variety of things (e.g., check which version
of LLVM is being used).  In gem5, each of these require a separate CPU
thread context -- which increasing -n handles in SE mode.  So if I had to
guess, I would say that this is what is happening.

If you added gdb locally to your docker, and you built the docker properly,
then I would expect gdb to work with gem5.

Thanks,
Matt

On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore  wrote:

> Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated
> system seems to make it work! (At least, I don't see that error at that
> point anymore). Is "resource temporarily unavailable" commonly due to CPU
> count? Curious to know how you made that connection.
>
> Re gdb: I am indeed using a local docker build
> (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that
> what you meant?
>
> Will send in a PR to the repo soon as I'm done :)
>
> On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair 
> wrote:
>
>> Hi Anoop,
>>
>> A few things here:
>>
>> - Regarding the original failure (at least the !FS part), this is
>> normally happening either because of the GPU Target ISA (e.g., gfx900) you
>> used in your Makefile (e.g., it is not supported) or because you didn't
>> properly specify what GPU ISA you are using when running the program.  So,
>> what is your command line for running this application and what ISA are you
>> specifying in your Makefile?
>> - If the "what()" is the real source of the error, then I think this
>> could be related to the number of CPU thread contexts you are running with
>> gem5.  What did you set "-n" to?
>> - Regarding gdb, @Matt P: did you remove gdb from what is installed in
>> the Docker a while back?  If so, I think Anoop would need to add it back
>> and create a local docker or something like that.
>> - Setting aside the above, it would be wonderful if you contribute the
>> CHAI benchmarks to gem5-resources once you get them working!  Please let us
>> know if we can do anything to help with that.
>>
>> Thanks,
>> Matt
>>
>> On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Curiously, running the gem5.debug executable with gdb within docker results
>>> in:
>>> Reading symbols from gem5/build/GCN3_X86/gem5.debug...
>>> (gdb) quit
>>> (the quit wasn't a command I provided, it just quits automatically). Is
>>> gdb working with gem5 GCN3 in Docker?
>>>
>>> I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail
>>> and the simerr logs are attached.
>>> I don't see anything peculiar other than a tgkill syscall with a SIGABRT
>>> sent to a thread thereafter halting within a few instructions.
>>>
>>> On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore  wrote:
>>>
 I am trying to port CHAI benchmarks
 similarly to
 gem5-resources/src/gpu/pannotia
 .
 I was able to HIPify (through the perl script + some manual changes) all
 the code files, and ran the BFS program. I see the following error message
 at the point of launching the CPU threads here
 
  (fork
 of HIPified CHAI). I do not see any of the prints from the CPU threads
 which leads me to believe the error is to do with the threads not being
 launched or a related error.

 (This looks related; incorporated the suggestion of linking against
 -pthread: https://stackoverflow.com/a/6485728)

 The stderr log is below; any help is appreciated.
 _
 
 AM: Launching CPU
 terminate called after throwing an instance of 'std::system_error'
 what():  Resource temporarily unavailable
 build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
 occurred: fault (General-Protection) detected @ PC
 (0x76afa941=>0x76afa942).(0=>1)
 Memory Usage: 19704072 KBytes

 Program aborted at tick 441590522500
 --- BEGIN LIBC BACKTRACE ---
 gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
 gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
 /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
 gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
 gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
 gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
 gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
 gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
 

[gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py)

2023-08-16 Thread Matt Sinclair via gem5-users
Hi Anoop,

A few things here:

- Regarding the original failure (at least the !FS part), this is normally
happening either because of the GPU Target ISA (e.g., gfx900) you used in
your Makefile (e.g., it is not supported) or because you didn't properly
specify what GPU ISA you are using when running the program.  So, what is
your command line for running this application and what ISA are you
specifying in your Makefile?
- If the "what()" is the real source of the error, then I think this could
be related to the number of CPU thread contexts you are running with gem5.
What did you set "-n" to?
- Regarding gdb, @Matt P: did you remove gdb from what is installed in the
Docker a while back?  If so, I think Anoop would need to add it back and
create a local docker or something like that.
- Setting aside the above, it would be wonderful if you contribute the CHAI
benchmarks to gem5-resources once you get them working!  Please let us know
if we can do anything to help with that.

Thanks,
Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

> Curiously, running the gem5.debug executable with gdb within docker results
> in:
> Reading symbols from gem5/build/GCN3_X86/gem5.debug...
> (gdb) quit
> (the quit wasn't a command I provided, it just quits automatically). Is
> gdb working with gem5 GCN3 in Docker?
>
> I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and
> the simerr logs are attached.
> I don't see anything peculiar other than a tgkill syscall with a SIGABRT
> sent to a thread thereafter halting within a few instructions.
>
> On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore  wrote:
>
>> I am trying to port CHAI benchmarks
>> similarly to
>> gem5-resources/src/gpu/pannotia
>> . I
>> was able to HIPify (through the perl script + some manual changes) all the
>> code files, and ran the BFS program. I see the following error message at
>> the point of launching the CPU threads here
>> 
>>  (fork
>> of HIPified CHAI). I do not see any of the prints from the CPU threads
>> which leads me to believe the error is to do with the threads not being
>> launched or a related error.
>>
>> (This looks related; incorporated the suggestion of linking against
>> -pthread: https://stackoverflow.com/a/6485728)
>>
>> The stderr log is below; any help is appreciated.
>> _
>> 
>> AM: Launching CPU
>> terminate called after throwing an instance of 'std::system_error'
>> what():  Resource temporarily unavailable
>> build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
>> occurred: fault (General-Protection) detected @ PC
>> (0x76afa941=>0x76afa942).(0=>1)
>> Memory Usage: 19704072 KBytes
>>
>> Program aborted at tick 441590522500
>> --- BEGIN LIBC BACKTRACE ---
>> gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
>> gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
>> gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
>> gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
>> gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
>> gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
>> gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
>> gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
>> gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
>> gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
>> gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
>> gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
>> gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
>> gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
>> gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]
>>
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]
>>
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
>>
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
>>
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]
>>
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
>>
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]
>>
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
>> 

[gem5-users] Re: gem5 VEGA_X86 simulation with GPU support

2023-07-23 Thread Matt Sinclair via gem5-users
Hi Lin,

I don't see anything obviously wrong with your command, but this error
seems to imply that something with the setup of the GPU device is wrong.
If you didn't change anything though, then probably there is something
wrong with our GPUFS instructions.  Matt P (CC'd) knows the GPUFS code much
better than me though, so hopefully he can help more here.

Thanks,
Matt S.

On Sun, Jul 23, 2023 at 7:47 AM LinS--- via gem5-users 
wrote:

> Hello,
> When I'm conducting VEGA_X86 simulation in gem5, I encountered the
> following issue. My gem5 version is v23.0.0.1, and I haven't made any
> modifications to the files. The "square" benchmark compiles successfully
> using the command "HCC_AMDGPU_TARGET=gfx900 make".  Am I missing the
> correct GPU simulation support? How should I proceed? Thank you very much.
>
> Here's the execution script:
>
> build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py \
> --disk-image image/kernel/x86-gpu-fs-20220512.img \
> --kernel image/vmlinux-5.4.0-105-generic \
> --gpu-mmio-trace benchmark/gem5-resources/src/gpu-fs/vega_mmio.log \
> --app benchmark/gem5-resources/src/gpu/square/bin/square
>
> Below is the output log:
>
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x1fc) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x8b) unsupported by
> gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x480) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x48d) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x48e) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x48f) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x490) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x485) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x486) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x488) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x48a) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x48b) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x48c) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x491) unsupported
> by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0xc0010015)
> unsupported by gem5. Skipping.
> src/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x4b564d05)
> unsupported by gem5. Skipping.
> src/dev/x86/pc.cc:117: warn: Don't know what interrupt to clear for
> console.
> src/dev/amdgpu/amdgpu_vm.hh:254: warn: Accessing unsupported MMIO
> aperture! Assuming NBIO
> src/dev/amdgpu/amdgpu_vm.hh:270: warn: Accessing unsupported frame
> apperture!
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x4a
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x28
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x69
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x69
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x69
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x69
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x69
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x69
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x69
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x69
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x4a
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x12
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0x11
> not supported.
> src/dev/amdgpu/pm4_packet_processor.cc:326: warn: PM4 packet opcode 0xa0
> not supported.
> Exiting @ tick 15482304666500 because m5_exit instruction encountered
> src/cpu/kvm/base.cc:570: hack: Pretending totalOps is equivalent to
> totalInsts()
>
>
> -
> Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
> applicable law.
> 128+0 records in
> 128+0 records out
> 131072 bytes (131 kB, 128 KiB) copied, 0.0080005 s, 16.4 MB/s
> insmod
> /lib/modules/5.4.0-105-generic/kernel/drivers/video/fbdev/core/sysimgblt.ko
> insmod
> /lib/modules/5.4.0-105-generic/kernel/drivers/video/fbdev/core/sysfillrect.ko
>
> insmod
> /lib/modules/5.4.0-105-generic/kernel/drivers/video/fbdev/core/syscopyarea.ko
>
> 

[gem5-users] Re: Exception when running libtorch simulation in SE mode

2023-07-18 Thread Matt Sinclair via gem5-users
For what it's worth, one of the students working with me (Marco, CC'd) is
having the same failure right now for the head of develop (plus this fix:
https://github.com/gem5/gem5/pull/99), except for a tiny GPU microbenchmark
that definitely is not using PyTorch or any higher level library.

We are working on getting a backtrace to understand what's going on for us
(and then push a fix as applicable), and it's possible our problems have
the same symptom but a different root cause.  But just wanted to chime in
that there are multiple cases where this error is happening on develop
right now with SE mode.

Matt


On Tue, Jul 18, 2023 at 7:58 PM Bobby Bruce via gem5-users <
gem5-users@gem5.org> wrote:

> I’m afraid I don’t know exactly what’s causing this error, but just to
> make sure, the binary you built and as a `CustomResource` executes on your
> host? This looks like an error coming from PyTorch, not the simulator. That
> being said, I don’t understand why "build/X86/sim/faults.cc:61: panic:
> panic condition !FullSystem occurred: fault (General-Protection) detected @
> PC “ is occurring after either, that could also be the issue. Personally,
> I’m always a bit scared linking to dynamic libraries on the host as well,
>
> If you want to get around this the annoying advice is to use FS mode. It’s
> slower, and requires creation of a disk image, but it isn’t nearly as
> error-prone as SE mode. If your binary works on your host then you should
> be able to get it to work in FS mode. Using checkpoints and (if you have
> the right hardware and are using X86) KVM cores can speed things up for you
> too.
>
> Also, as a sidenote: If you’re wanting to simulate PyTorch, don’t you want
> to simulate a GPU too?
>
> --
> Dr. Bobby R. Bruce
> Room 3050,
> Kemper Hall, UC Davis
> Davis,
> CA, 95616
>
> web: https://www.bobbybruce.net
>
> On Jul 14, 2023, at 3:02 AM, Caio Vieira via gem5-users <
> gem5-users@gem5.org> wrote:
>
>
> Hi everyone,
>
> I'm trying to execute gem5 simulations using libtorch in SE mode. However,
> I get the following error message:
>
> --- Error message ---
> ...
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  expected eof but found 'ident' here:
> aten::quantized_lstm.inpr input, Tensor[] orch.classes.rnn.CellPara[]
> params, bool has_biases, int num_layers, float dropout, bool train, bool
> bidirectional, bool batch_first, *, ScalarType? dtype=None, bool
> use_dynamic=False) dy
> namic=False) -> (Tensor, Tensor, Tenso Tensor, Tensor)
>   ~ <--- HERE
>
> build/X86/sim/syscall_emul.cc:86: warn: ignoring syscall
> rt_sigprocmask(...)
>   (further warnings will be suppressed)
> build/X86/sim/syscall_emul.cc:86: warn: ignoring syscall rt_sigaction(...)
>   (further warnings will be suppressed)
> build/X86/sim/faults.cc:61: panic: panic condition !FullSystem occurred:
> fault (General-Protection) detected @ PC
> (0x7fff7a3d5898=>0x7fff7a3d5899).(0=>1)
> Memory Usage: 11842716 KBytes
> Program aborted at tick 294083905383
> --- BEGIN LIBC BACKTRACE ---
> ...
>
> The simulation fails before the first line of the main function. I believe
> that it is failing to load the libtorch library.
> Unfortunately, it is not possible to build libtorch with "-static" since
> their static builds is broken for quiet a long
> time: https://github.com/pytorch/pytorch/issues/21737
> I've tested with gem5 v22.1.0.0 and also 22.0.0.2. I've also tested using
> different GCC versions to build the simulated binary.
>
> For anyone interested in reproducing the error, I'm sending a "setup.sh"
> script to create a minimal reproducible environment.
> Simply copy and paste the script below and name it as "setup.sh" in a new
> directory, then:
>
> source setup.sh
> cmake --B build -S .
> cmake --build build
> ./ config.py build/main
>
> Best regards,
> Caio Vieira
>
> --- setup.sh ---
>
> #!/bin/bash
>
> # Bash script to create minimal reproducible environment for libtorch
> simulation
> # bug. This script creates necessary files such as a CMakeLists.txt and a
> minimal
> # main.cpp. The CMakeLists.txt file downloads and manages libtorch by
> saving it
> # in a ""_deps"" folder. Steps to reproduce the bug:
> # ./
> # cmake -B build -S .
> # cmake --build build
> # ./ config.py build/main
>
> function create_cmake() {
> cat > CMakeLists.txt <<- \EOF
> cmake_minimum_required(VERSION 3.22 FATAL_ERROR)
>
> # Download and manage libtorch dependency
> set(DEPENDENCY_DIR "${CMAKE_CURRENT_LIST_DIR}/_deps")
>
> file(MAKE_DIRECTORY "${DEPENDENCY_DIR}")
> if(NOT EXISTS "${DEPENDENCY_DIR}/libtorch")
> file(DOWNLOAD
>
> https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.0.0%2Bcpu.zip
> "${DEPENDENCY_DIR}/libtorch.zip")
> file(ARCHIVE_EXTRACT
> INPUT "${DEPENDENCY_DIR}/libtorch.zip"
> DESTINATION "${DEPENDENCY_DIR}")
> file(REMOVE "${DEPENDENCY_DIR}/libtorch.zip")
> endif()
> set(CMAKE_PREFIX_PATH 

[gem5-users] Re: Replacing CPU model in GPU-FS

2023-07-05 Thread Matt Sinclair via gem5-users
Answers:

1.  Yes, I believe so.  However, I have never personally tried using the O3
model with the GPU.  Matt P has, I believe, so he may have better feedback
there.

2.  I have not followed the chain of events all the way through here, but I
*believe* that the builtin you highlighted is used at the compiler level by
HIPCC/LLVM to generate the appropriate assembly for a given AMD GPU.  In
this case (gfx900), I believe there is a 1-1 correlation with this builtin
becoming an s_sleep assembly instruction (maybe with the addition of a
v_mov-type instruction before it to set the register to the appropriate
sleep value).  I am not aware of s_sleep()'s builtin requiring OS calls (or
emulation).  But what you have described is more generally the issue with
SE mode (CPU, GPU, etc.) -- because SE mode does not model OS calls, the
fidelity of anything involving the OS will be less.  Perhaps a trite way to
answer this is: if the fidelity of the OS calls is important for the
applications you are studying, then I strongly recommend using FS mode.

Hope this helps,
Matt S.

On Tue, Jul 4, 2023 at 6:01 AM Anoop Mysore  wrote:

> Thank you so much for the kind and detailed explanations!
>
> Just to clarify: I can use the APU config (apu_se.py) and switch out to an
> O3 CPU, and I would still have the detailed GPU model, and the disconnected
> Ruby model that synchronizes between CPU and GPU at the system-level
> directory -- is that correct?
>
> Last question: when using the APU config for simulating HeteroSync which,
> for example, has a sleep mutex primitive that invokes a
> __builtin_amdgcn_s_sleep(), is there any OS involvement? If yes, would SE
> mode's emulation of those syscalls inexorably sacrifice any fidelity that
> could be argued leads to inaccurate evaluations of heterogeneous coherence
> implementations? Or are any there other factors of insufficient fidelity
> that might be important in this regard?
>
>
> On Fri, Jun 30, 2023 at 7:40 PM Matt Sinclair <
> mattdsinclair.w...@gmail.com> wrote:
>
>> Just to follow-up on 4 and 5:
>>
>> 4.  The synchronization should happen at the directory-level here, since
>> this is the first level of the memory system where both the CPU and GPU are
>> connected.  However, I have not tested if the programmer sets the GLC bit
>> (which should perform the atomic at the GPU's LLC) if Ruby has the
>> functionality to send invalidations as appropriate to allow this.  I
>> suspect it would work as is, but would have to check ...
>>
>> 5.  Yeah, for the reasons Matt P already stated O3 is not currently
>> supported in GPUFS.  So GPUSE would be a better option here.  Yes, you can
>> use the apu_se.py script as the base script for running GPUSE experiments.
>> There are a number of examples on gem5-resources for how to get started
>> with this (including HeteroSync), but I normally recommend starting with
>> square if you haven't used the GPU model before:
>> https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/square/.
>> In terms of support for synchronization at different levels of the memory
>> hierarchy, but default the GPU VIPER coherence protocol assumes that all
>> synchronization happens at the system-level (at the directory, in the
>> current implementation).  However, one of my students will be pushing
>> updates (hopefully today) that allow non-system level support (e.g., the
>> GPU LLC "GLC" level as mentioned above).  It sounds like you want to change
>> the cache hierarchy and coherence protocol to add another level of cache
>> (the L3) before the directory and after the CPU/GPU LLCs?  If so, you would
>> need to change the current Ruby support to add this additional level and
>> the appropriate transitions to do so.  However, if you instead meant that
>> you are thinking of the directory level as synchronizing between the CPU
>> and GPU, then you could use the support as is without any changes (I think).
>>
>> Hope this helps,
>> Matt S.
>>
>> On Fri, Jun 30, 2023 at 12:05 PM Poremba, Matthew via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> [Public]
>>>
>>> Hi,
>>>
>>>
>>>
>>>
>>>
>>> No worries about the questions! I will try to answer them all, so this
>>> will be a long email :
>>>
>>>
>>>
>>> The disconnected (or disjoint) Ruby network is essentially the same as
>>> the APU Ruby network used in SE mode -  That is, it combines two Ruby
>>> protocols in one protocol (MOESI_AMD_base and GPU_VIPER).  They are
>>> disjointed because there are no paths / network links between the GPU and
>>> CPU side, simulating a discrete GPU. These protocols work together because
>>> they use the same network messages / virtual channels to the directory –
>>> Basically you cannot simply drop in another CPU protocol and have it work.
>>>
>>>
>>>
>>> Atomic CPU is working **very** recently – As in this week.  It is on
>>> review board right now and I believe might be part of the gem5 v23.0
>>> release.  However, the reason Atomic and KVM CPUs are 

[gem5-users] Re: Replacing CPU model in GPU-FS

2023-06-30 Thread Matt Sinclair via gem5-users
Just to follow-up on 4 and 5:

4.  The synchronization should happen at the directory-level here, since
this is the first level of the memory system where both the CPU and GPU are
connected.  However, I have not tested if the programmer sets the GLC bit
(which should perform the atomic at the GPU's LLC) if Ruby has the
functionality to send invalidations as appropriate to allow this.  I
suspect it would work as is, but would have to check ...

5.  Yeah, for the reasons Matt P already stated O3 is not currently
supported in GPUFS.  So GPUSE would be a better option here.  Yes, you can
use the apu_se.py script as the base script for running GPUSE experiments.
There are a number of examples on gem5-resources for how to get started
with this (including HeteroSync), but I normally recommend starting with
square if you haven't used the GPU model before:
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/square/.
In terms of support for synchronization at different levels of the memory
hierarchy, but default the GPU VIPER coherence protocol assumes that all
synchronization happens at the system-level (at the directory, in the
current implementation).  However, one of my students will be pushing
updates (hopefully today) that allow non-system level support (e.g., the
GPU LLC "GLC" level as mentioned above).  It sounds like you want to change
the cache hierarchy and coherence protocol to add another level of cache
(the L3) before the directory and after the CPU/GPU LLCs?  If so, you would
need to change the current Ruby support to add this additional level and
the appropriate transitions to do so.  However, if you instead meant that
you are thinking of the directory level as synchronizing between the CPU
and GPU, then you could use the support as is without any changes (I think).

Hope this helps,
Matt S.

On Fri, Jun 30, 2023 at 12:05 PM Poremba, Matthew via gem5-users <
gem5-users@gem5.org> wrote:

> [Public]
>
> Hi,
>
>
>
>
>
> No worries about the questions! I will try to answer them all, so this
> will be a long email :
>
>
>
> The disconnected (or disjoint) Ruby network is essentially the same as the
> APU Ruby network used in SE mode -  That is, it combines two Ruby protocols
> in one protocol (MOESI_AMD_base and GPU_VIPER).  They are disjointed
> because there are no paths / network links between the GPU and CPU side,
> simulating a discrete GPU. These protocols work together because they use
> the same network messages / virtual channels to the directory – Basically
> you cannot simply drop in another CPU protocol and have it work.
>
>
>
> Atomic CPU is working **very** recently – As in this week.  It is on
> review board right now and I believe might be part of the gem5 v23.0
> release.  However, the reason Atomic and KVM CPUs are required is because
> they use the atomic_noncaching memory mode and basically bypass the CPU
> cache. The timing CPUs (timing and O3) are trying to generate routes to the
> GPU side which is causing deadlocks.  I have not had any time to look into
> this further, but that is the status.
>
>
>
> | are the GPU applications run on KVM?
>
>
>
> The CPU portion of GPU applications runs on KVM.  The GPU is simulated in
> timing mode so the compute units, cache, memory, etc. are all simulated
> with events.  For an application that simply launches GPU kernels, the CPU
> is just waiting for the kernels to finish.
>
>
>
> For your other questions:
>
> 1.  Unfortunately no, it is not this easy. There is an issue with timing
> CPUs that is still an outstanding bug – we focused on atomic CPU recently
> as a way to allow users who aren’t able to use KVM to be able to use the
> GPU model.
>
> 2.  KVM exits whenever there is a memory request outside of its VM range.
> The PCI address range is outside the VM range, so for example when the CPU
> writes to PCI space it will trigger an event for the GPU. The only Ruby
> involvement here is that Ruby will send all requests outside of its memory
> range to the IO bus (KVM or not).
>
> 3.  The MMIO trace is only to load the GPU driver and not used in
> applications. It basically contains some reasonable register values for
> anything that is not modeled in gem5 so that we do not need to model them
> (e.g., graphics, power management, video encode/decode, etc.).  This is not
> required for compute-only GPU variants but that is a different topic.
>
> 4.  I’m not familiar enough with this particular application to answer
> this question.
>
> 5.  I think you will need to use SE mode to do what you are trying to do.
> Full system mode is using the real GPU driver, ROCm stack, etc. which
> currently does not support any APU-like devices. SE mode is able to do this
> by making use of an emulated driver.
>
>
>
>
>
> -Matt
>
>
>
> *From:* Anoop Mysore via gem5-users 
> *Sent:* Friday, June 30, 2023 8:43 AM
> *To:* The gem5 Users mailing list 
> *Cc:* Anoop Mysore 
> *Subject:* [gem5-users] Re: Replacing CPU model in GPU-FS
>
>

[gem5-users] Re: GPU-FS simulation progress

2023-06-23 Thread Matt Sinclair via gem5-users
Maybe I'm missing something, but where in that set of prints is the error?
At the end I see this:

Exiting @ tick 2581705103 because m5_exit instruction encountered

Which is the normal thing to see when gem5 exists.

Matt

On Fri, Jun 23, 2023 at 4:06 AM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

> Reviving a previous thread:
> https://www.mail-archive.com/gem5-users@gem5.org/msg21015.html
>
> I am facing the same exact error, almost the same processor -- AMD Ryzen
> 5800 HS (laptop).
> However, for the OP, moving to a faster EPYC worked. I was able to move
> only to a (desktop) Intel i7-8700 CPU @ 3.20GHz. Here's a couple lines of
> progress, but it seems to fail too.
> I am on ROCm v4.0.1 -- compiled sqare test with that. Tried both locally
> built disk-image, and a downloaded image, and downloaded kernel.
>
> Here's the concise gem5 log (without debug flags -- that's attached as a
> file:
>
> ___
> gem5 Simulator System.  https://www.gem5.org
> gem5 is copyrighted software; use the --copyright option for details.
>
> gem5 version 22.1.0.0
> gem5 compiled Jun 22 2023 18:46:21
> gem5 started Jun 23 2023 10:24:44
> gem5 executing on ashkan-asgharzadeh, pid 24266
> command line: gem5/build/VEGA_X86/gem5.opt
> gem5/configs/example/gpufs/vega10_kvm.py --disk-image
> gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel
> gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace
> gem5-resources/src/gpu-fs/mmio_trace.log --app
> gem5-resources/src/gpu/square/bin/square
>
> warn: Memory mode will be changed to atomic_noncaching
> warn: The `get_runtime_isa` function is deprecated. Please migrate away
> from using this function.
> Global frequency set at 1 ticks per second
> build/VEGA_X86/mem/dram_interface.cc:692: warn: DRAM device capacity (8192
> Mbytes) does not match the address range assigned (4096 Mbytes)
> build/VEGA_X86/sim/kernel_workload.cc:46: info: kernel located at:
> gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic
> build/VEGA_X86/base/stats/storage.hh:282: warn: Bucket size (5) does not
> divide range [1:75] into equal-sized buckets. Rounding up.
> ...
> ...
> build/VEGA_X86/mem/dram_interface.cc:692: warn: DRAM device capacity (128
> Mbytes) does not match the address range assigned (16384 Mbytes)
> build/VEGA_X86/base/statistics.hh:280: warn: One of the stats is a legacy
> stat. Legacy stat is a stat that does not belong to any statistics::Group.
> Legacy stat is deprecated.
>   0: system.pc.south_bridge.cmos.rtc: Real-time clock set to Sun Jan
>  1 00:00:00 2012
> system.pc.com_1.device: Listening for connections on port 3456
> build/VEGA_X86/base/statistics.hh:280: warn: One of the stats is a legacy
> stat. Legacy stat is a stat that does not belong to any statistics::Group.
> Legacy stat is deprecated.
> 0: system.remote_gdb: listening for remote gdb on port 7000
> build/VEGA_X86/dev/intel_8254_timer.cc:128: warn: Reading current count
> from inactive timer.
> tcmalloc: large alloc 2147483648 bytes == 0x562c3a2f6000 @  0x7f421c1cd887
> 0x562c18bc2019 0x562c18106aa6 0x562c17dd474f 0x7f421ca5758a 0x7f421c9bfec8
> 0x7f421c9c6303 0x7f421c9be803 0x7f421c9c02aa 0x7f421c9c6303 0x7f421c9be803
> 0x7f421c9c02be 0x7f421c9c6303 0x7f421c9bfa0f 0x7f421c9c04ce 0x7f421c9c124b
> 0x7f421c9cc55d 0x7f421ca5753b 0x7f421c9c01ec 0x7f421c9c6303 0x7f421c9bfa0f
> 0x7f421c9c04ce 0x7f421ca7fd6b 0x7f421caab768 0x562c17e5aca9 0x562c17d4c7e0
> 0x7f421a437c87 0x562c17dc531a
> Running the simulation
> build/VEGA_X86/cpu/kvm/base.cc:150: info: KVM: Coalesced MMIO disabled by
> config.
> build/VEGA_X86/arch/x86/cpuid.cc:181: warn: x86 cpuid family 0x:
> unimplemented function 2
>
> build/VEGA_X86/sim/simulate.cc:192: info: Entering event queue @ 0.
> Starting simulation...
> ...
> build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1563: warn: kvm-x86: MSR
> (0x4b564d05) unsupported by gem5. Skipping.
> build/VEGA_X86/dev/x86/pc.cc:117: warn: Don't know what interrupt to clear
> for console.
> build/VEGA_X86/dev/amdgpu/amdgpu_vm.hh:240: warn: Accessing unsupported
> MMIO aperture! Assuming NBIO
> Exiting @ tick 2581705103 because m5_exit instruction encountered
> build/VEGA_X86/cpu/kvm/base.cc:572: hack: Pretending totalOps is
> equivalent to totalInsts()
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: bad ioctl error in gpu_comput_driver.cc

2023-06-20 Thread Matt Sinclair via gem5-users
Right, the error you got with HeteroSync is because the generation of GPU
the Makefile compiled for (gfxXXX) was not the same as the version the
simulation supported.  Since you were using GCN3 you would need to compile
for gfx801 (APU) or gfx803 (dGPU) depending on what if you are trying to
run a tightly coupled or discrete GPU experiment.  It looks like you were
running APU based on what was in your command line, for what it’s worth.

There is not a docker setup for FS mode in the same way there is for SE
mode, but there is this:
https://www.gem5.org/2023/02/13/moving-to-full-system-gpu.html

Matt

On Tue, Jun 20, 2023 at 2:18 PM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

> Wrong build; it works when built for gfx-8 (per readme)
> $ make release-gfx8
> in heterosync directory.
>
> On Tue, Jun 20, 2023, 6:26 PM Anoop Mysore  wrote:
>
>> Oh I see, that makes sense; I was on ROCm 5.1.0.
>> I am now running on docker and the square test works as expected.
>> I don't see a ready-made config for full system emulation with gcn3 -- is
>> that available someplace else or should I figure out how to build one? I
>> think I need that because when I try and run the heterosync benchmark
>> (through docker), I see the following:
>> ```
>> /HIP/rocclr/hip_code_object.cpp:120: guarantee(false &&
>> "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
>> build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
>> occurred: fault (General-Protection) detected @ PC
>> (0x76afa941=>0x76afa942).(0=>1)
>> Memory Usage: 2502568 KBytes
>> Program aborted at tick 7278973
>> ```
>> Is this expected as I'm running SE mode?
>>
>>> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
-- 
Regards,
Matt Sinclair
Assistant Professor
University of Wisconsin-Madison
Computer Sciences Department
cs.wisc.edu/~sinclair
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: How to change voltage and frequency of individual ComputeUnit for GPU?

2023-04-03 Thread Matt Sinclair via gem5-users
DVFS is not my area of expertise, so I'm not sure I can offer much useful 
feedback here.  My guess is you probably meant line 429 of what I see on 
develop: 
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/configs/example/apu_se.py#429?

Regardless, without more information it's hard to say what is failing.  But I 
believe the voltage does not fundamentally affect the correctness of the model 
- regardless of what model you set, it runs at the set frequency and eventually 
completes.  I *think* the voltage is more for the energy modeling aspects.  But 
Srikant (CC'd) knows a lot more about GPU DVFS in gem5, so hopefully he can 
comment further.

Matt

From: Mejbaul Islam, Kazi M. 
Sent: Monday, April 3, 2023 10:04 PM
To: Matt Sinclair ; gem5-users@gem5.org
Cc: Poremba, Matthew ; srikan...@gmail.com
Subject: Re: How to change voltage and frequency of individual ComputeUnit for 
GPU?

Hi,

Thank you so much, will definitely contribute if I can do that. On a similar 
note, I have first tried to change voltage and frequency of shader in 
gem5/configs/example/apu_se.py (line 412). The thing is, beyond a certain range 
of GPU frequency, it crashes which makes sense. But if I put gpu_voltage=0.0V 
it still works fine as it is (with same ticks and all). Even I have changed 
CPUvoltage and gpu_voltage to 0.0V but it works which is wired. Where may be 
the issue do you think?

Regards,
Kazi

From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Monday, April 3, 2023 5:00 PM
To: gem5-users@gem5.org 
mailto:gem5-users@gem5.org>>
Cc: Mejbaul Islam, Kazi M. 
mailto:kmejbaulis...@ufl.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>; 
srikan...@gmail.com 
mailto:srikan...@gmail.com>>
Subject: Re: How to change voltage and frequency of individual ComputeUnit for 
GPU?

[External Email]
Hi Kazi,

Srikant (CC'd) previously added some support for things like this 
(https://gem5-review.googlesource.com/c/public/gem5/+/61589),
 but in the L1/L2 caches instead of the CUs specifically.  From a cursory check 
I don't see this support directly integrated into the CUs, but since they are 
ClockedObject's 
(https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/sim/clocked_object.hh#107,
 
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/sim/ClockedObject.py#50)
 my guess is you could use the above patch as a model to add the support you 
want.

If you do add it, it would be great if you can add this back as a feature for 
the community!

Thanks,
Matt

From: Mejbaul Islam, Kazi M. via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, April 3, 2023 3:49 AM
To: gem5-users@gem5.org 
mailto:gem5-users@gem5.org>>
Cc: Mejbaul Islam, Kazi M. mailto:kmejbaulis...@ufl.edu>>
Subject: [gem5-users] How to change voltage and frequency of individual 
ComputeUnit for GPU?

Hello,

In gem5/configs/example/apu_se.py, ComputeUnit is used to build GPU. Is there 
any way to change voltage and frequency of each individual ComputeUnit?

Best,
Kazi
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: How to change voltage and frequency of individual ComputeUnit for GPU?

2023-04-03 Thread Matt Sinclair via gem5-users
Hi Kazi,

Srikant (CC'd) previously added some support for things like this 
(https://gem5-review.googlesource.com/c/public/gem5/+/61589), but in the L1/L2 
caches instead of the CUs specifically.  From a cursory check I don't see this 
support directly integrated into the CUs, but since they are ClockedObject's 
(https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/sim/clocked_object.hh#107,
 
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/sim/ClockedObject.py#50)
 my guess is you could use the above patch as a model to add the support you 
want.

If you do add it, it would be great if you can add this back as a feature for 
the community!

Thanks,
Matt

From: Mejbaul Islam, Kazi M. via gem5-users 
Sent: Monday, April 3, 2023 3:49 AM
To: gem5-users@gem5.org 
Cc: Mejbaul Islam, Kazi M. 
Subject: [gem5-users] How to change voltage and frequency of individual 
ComputeUnit for GPU?

Hello,

In gem5/configs/example/apu_se.py, ComputeUnit is used to build GPU. Is there 
any way to change voltage and frequency of each individual ComputeUnit?

Best,
Kazi
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: gem5-gcn(VEGA) related issues

2023-03-07 Thread Matt Sinclair via gem5-users
I have personally never tried gfx906 but in theory it should work.  You
would have to change the config files to allow gfx906 as a valid option (
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/configs/example/apu_se.py#941)
and then see what happens.

Regarding the assembly error, I don't see a particular error with the
assembler -- it seems like you are saying gem5 just fails when you run
something with gfx900?  The error in particular appears to be an error in
the gem5 CPU.  There's not enough information here to help though.  Please
let us know:

- what application are you running (is it from gem5 resources or is this
something you wrote on your own?)
- how did you compile it?
- how did you run gem5? (what is your command line)

Regarding rocprof, I don't think you should need to run it with/in gem5.
gem5 emits its own stats in stats.txt.  The rocprof profiler is intended
for running on a real GPU with your application.  Why do yo need an
external performance measurement tool?  Is there something that the gem5
stats doesn't provide you are looking for?

Matt


On Tue, Mar 7, 2023 at 12:48 AM gaohang980502--- via gem5-users <
gem5-users@gem5.org> wrote:

> Hi,
>
> I am a graduate student interested in GEM5 simulator and am currently
> trying to carry out experiments using gem5.
> I compiled VEGA_X86/gem5.opt using gcn-gpu images on source gem5-v22.0.0,
> but it only seems to support VEGA10 (gfx900,gfx902), so my first question
> is, Does gem5 now support the newer VEGA20 (gfx906) model?
> Second, I want to execute HIP assembly code, but in SE mode, gfx900
> assembler V3 format kernel always generates the following error:
>
> >build/VEGA_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
> get_mempolicy(...)
> >build/VEGA_X86/sim/syscall_emul.cc:74: warn: ignoring syscall madvise(...)
> >build/VEGA_X86/gpu-compute/gpu_compute_driver.cc:887: warn: unimplemented
> ioctl: AMDKFD_IOC_MAP_MEMORY_TO_GPU
> >kernel exec
> >build/VEGA_X86/arch/x86/faults.cc:165: panic: Tried to read unmapped
> address 0x7373656363e9.
> >PC: (0x76fb56e6=>0x76fb56ed).(0=>1), Instr:   MOV_R_M : ld   rdi,
> DS:[rax + 0x88]
> >Memory Usage: 2573064 KBytes
> >Program aborted at tick 80121757000
>
> I can't find the reason for the problem. My gfx900 assembler kernel code
> was obtained by compiling a simple HIP C program with the option
> save-temps, which should be fine because the gfx906 code I got the same way
> executes correctly on AMD GPU hardware devices.
>
> Final,I want to use AMD's GPU performance profiling tool rocprofiler to
> analyze Benchmark behavior, but I don't know how to use this tool in the
> emulator. My confusion is that Benchmark has to be passed into a
> configuration script to make sense, rather than being used on its own, so
> the problem is how to use the tool to monitor the metrics gem5 emulates
> when the GPU component is executing a GPU program. Now there is no
> solution, how to use external performance analysis tools in the simulator?
> Also, is there a gprof tutorial supported by gem5?
>
> Looking forward to your reply, thank you very much!
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Error when running test_bwd_bn test with gem5 GCN3 GPU

2023-03-05 Thread Matt Sinclair via gem5-users
Can you please provide more information about what the problem is?  The
error message you posted is lacking context.  Specifically what input size
were you trying to use? And did you generate the appropriate cachefiles
before running, as mentioned here:
http://resources.gem5.org/resources/dnn-mark?  My guess is you didn’t use
the —num-cus flag when generating the cachefiles?  Finally, what is the
command line was when you ran the experiment?

In terms of why we don’t run tests with all the different number of CUs, we
don’t have infinite compute resources or time when running the tests (e.g.,
we want them to complete overnight).  So we need to be targeted and only
run a subset of tests that will complete within the time and resources we
have.

Hope this helps,
Matt

On Sun, Mar 5, 2023 at 5:48 AM 1575883782 via gem5-users <
gem5-users@gem5.org> wrote:

> Hi,
>
> Sorry to disturb you. When I run the DNNMark test_bwd_bn, I found the
> error.
> ```
> sh: 1: Cannot fork
> MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not
> support AMDGPU. Expect performance degradation.
> ```
>
> I saw other people who had the same problem, but it doesn't seem to have
> been solved. (
> https://www.mail-archive.com/gem5-users@gem5.org/msg20597.html)
> Maybe I can offer more imformation. When I run test_bwd_bn with 4 CUs,
> everything is ok. But when I run it with 8 CUs or 16 CUs, the error occurs.
> Besides, the person who asked in the email above was using 128CUs.
>
> Also, I found there are other gpu workloads can run successfully with
> 4CUs, but not with 16CUs or more CUs.
>
> If possible, can the community measure 8CUs or more during the weekly
> test? That's just my simple hope.
>
> Best.
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
-- 
Regards,
Matt Sinclair
Assistant Professor
University of Wisconsin-Madison
Computer Sciences Department
cs.wisc.edu/~sinclair
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Unavailability of GPU_RfO and GPU_VIPER_Region protocol in gem5 v21

2023-02-08 Thread Matt Sinclair via gem5-users
Tl;dr: while you can copy the code, I suspect it will be very painful to get 
these to work.

This is the response I got in 2020 when asking a similar question:
"I took a look at the code and I apologize for the confusion.  I now realize we 
did not make it clear that we deprecated those protocols in Tony's staging 
branch.  We should explicitly remove them ...
The reason for deprecation depends on the protocol.  For RFO, we choose to 
deprecated (sic) it because it is incompatible with the underlining assumption 
of the compiler.  As you know, the compiler will insert wait cnts and cache 
invalidate instructions that are unnecessary for RFO.  Also write instructions 
now expect a 2-phase callback which is also incompatible with RFO.  Perhaps we 
could have modified the GPU coalescer to get around these issues, but that 
seemed more work that it is worth.  Meanwhile the VIPER Region/Baseline 
protocols should technically work with the latest branch, but I do not believe 
they have ever been tested with Tuan's new GPU tester.  If your student wants 
those protocols to work, I suggest he start with the tester.
I believe the particular issue your student is currently encountering is 
directly related to the RFO protocol py file not being updated to the latest 
interface changes.  Depending how good the student is, they may be able to 
figure that out by replicating the changes made to the VIPER py file in the 
staging branch.  Eventually the student will hit the "hard part" and notice the 
code we ripped out of the GPUCoalescer.  If you do a diff of the following two 
files, you'll see that the "assumingRfOCoherence" flag is missing from Tony's 
staging branch.   That will take some effort and thinking to determine how to 
fix.
https://gem5.googlesource.com/amd/gem5/+/refs/heads/agutierr/master-gcn3-staging/src/mem/ruby/system/GPUCoalescer.cc
https://gem5.googlesource.com/amd/gem5/+/refs/heads/master/src/mem/ruby/system/GPUCoalescer.cc
We absolutely plan to support the public branch long-term, but yes, currently 
we were hoping to only support VIPER."
From: Poremba, Matthew via gem5-users 
Sent: Wednesday, February 8, 2023 2:43 PM
To: The gem5 Users mailing list 
Cc: VIPIN PATEL ; Poremba, Matthew 

Subject: [gem5-users] Re: Unavailability of GPU_RfO and GPU_VIPER_Region 
protocol in gem5 v21


[AMD Official Use Only - General]

Hi,


GPU_RfO and GPU_VIPER_Region were deprecated, mostly because there is no one to 
help maintain all of the GPU protocols, so we opted to focus on just one.  I 
don't think there have been any Ruby/SLICC changes that would have broken the 
ability to build them though, so I suspect you could simply copy them from an 
older gem5 version and it would work.  Basically you'll want to make sure you 
have all the files listed in src/mem/ruby/protocol/GPU_RfO.slicc, the GPU_RfO 
build_opts file (or use the scons variable), and similarly for GPU_VIPER_Region.


-Matt

From: VIPIN PATEL via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, February 6, 2023 7:48 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: VIPIN PATEL mailto:patelvipi...@gmail.com>>
Subject: [gem5-users] Unavailability of GPU_RfO and GPU_VIPER_Region protocol 
in gem5 v21

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.

Dear All,

The GPU_RfO and GPU_VIPER_Region protocol were part of the gem5 
v20.1.0.5
 but were removed from 
v21.0.0.0
 onwards.
The removal of these protocols is not mentioned in the release 
notes.
I had a few queries:
a) Are these protocols merged into other protocols or deprecated?
b) Can we port the Ruby and SLICC state machine files for GPU_RfO and the Viper 
protocol to a newer version of Gem5? What engineering challenges need to be 
addressed?

Regards,
Vipin

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: 回复:Re: 回复:Re: Gem5 GCN3 (GPUCoalescer detected deadlock when running pagerank.)

2022-11-07 Thread Matt Sinclair via gem5-users
Thanks Matt P, I hadn’t gotten a chance to try reverting that patch.  I agree 
reverting it and running SE mode or using FS mode is the simplest solution in 
the meantime.

In terms of the deadlock: I think it’s just that many ticks because the 
threshold for deadlocks is very long/big.  I wasn’t reading too much into that 
– and anyways since the develop branch seems to not have that failure, I don’t 
think debugging it is the top priority?

We’ll definitely need to dig further into the CPUID stuff as you mentioned.

Thanks,
Matt S.

From: Poremba, Matthew via gem5-users 
Sent: Monday, November 7, 2022 5:01 PM
To: The gem5 Users mailing list 
Cc: 1575883782 <1575883...@qq.com>; Poremba, Matthew 
Subject: [gem5-users] Re: 回复:Re: 回复:Re: Gem5 GCN3 (GPUCoalescer detected 
deadlock when running pagerank.)


[AMD Official Use Only - General]

Hi,


The rocclr, panic, and unimplemented instructions errors/warnings seem to be 
caused by this patch: 
https://gem5-review.googlesource.com/c/public/gem5/+/64831.  It is likely the 
ROCm stack is taking a different code path with the different processor vendor 
string and doing things that gem5 doesn’t do properly yet.  I don’t know what 
exactly the problem is but pagerank runs with this change reverted.  It 
(probably) won’t be easy to track down the exact problem without more directed 
tests for the new processor features.

Your options are (1) to revert this patch or do the equivalent of setting the 
string back in your python config but I think that means you cannot run this 
benchmark on Ubuntu 22.04.  Alternately, (2) this application is known to work 
in the GPU fullsystem environment on the develop branch with the default 
parameters (4 CUs as well).  You could try that if you don’t require SE mode. I 
am testing 16 CUs in fullsystem and it is slowly making progress, so that 
appears to be working too.

Regarding the coalescer timeouts, I don’t have much advice. Based on the tick 
number being in the trillions and deadlock being triggered with a timeout in 
the hundreds of millions, it looks like the simulation is *very slowly* making 
progress until you eventually get unlucky with a deadlock.  One thing to check 
when you are increasing the CU count is to ensure the rest of the system is 
balanced along with it, such as increasing the number of memory channels 
accordingly to get the bandwidth needed to avoid deadlock.


-MattP

From: 1575883782 via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Sunday, November 6, 2022 4:57 PM
To: The gem5 Users mailing list 
mailto:gem5-users@gem5.org>>
Cc: 1575883782 <1575883...@qq.com>
Subject: [gem5-users] 回复:Re: 回复:Re: Gem5 GCN3 (GPUCoalescer detected deadlock 
when running pagerank.)

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.

Hi, Matt

I didn't change Makefile for PageRank. Actually, I use the same PageRank obj in 
different gem5 (It was compiled only once).

Looking forward to your good news. And please let me know if there's anything I 
can do.

Thanks.

-- 原始邮件 --
发件人: "The gem5 Users mailing list" 
mailto:gem5-users@gem5.org>>;
发送时间: 2022年11月7日(星期一) 上午6:16
收件人: "The gem5 Users mailing 
list"mailto:gem5-users@gem5.org>>;
抄送: "1575883782"<1575883...@qq.com>;"Matt 
Sinclair"mailto:sincl...@cs.wisc.edu>>;
主题: [gem5-users] Re: 回复:Re: Gem5 GCN3 (GPUCoalescer detected deadlock when 
running pagerank.)

Thanks, this is helpful.  Regarding the trace: if this is the failure on 
develop, then I don’t think you need to get a trace, as the failure is 
different here.  But yes, ProtocolTrace would be the flag to use for this.

Regarding PageRank, I am running just the PageRank SPMV variant from the weekly 
tests in isolation, to validate if that is working.  If that works and what you 
ran doesn’t, then perhaps there is something wrong with the docker config – TBD 
though.  In terms of the error, I don’t think it’s a problem with the 
compilation.  The error comes from: 
https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.0.x/rocclr/hip_global.cpp#L69,
 which is happening because the kernel is not being found by HIP.  I don’t know 
exactly why this is happening yet, but unless you changed the Makefile for 
PageRank I don’t see why it would be the HIP version failing.

More once I can dig further into this.

Matt

From: 1575883782 via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Sunday, November 6, 

[gem5-users] Re: 回复:Re: Gem5 GCN3 (GPUCoalescer detected deadlock when running pagerank.)

2022-11-06 Thread Matt Sinclair via gem5-users
Thanks, this is helpful.  Regarding the trace: if this is the failure on 
develop, then I don’t think you need to get a trace, as the failure is 
different here.  But yes, ProtocolTrace would be the flag to use for this.

Regarding PageRank, I am running just the PageRank SPMV variant from the weekly 
tests in isolation, to validate if that is working.  If that works and what you 
ran doesn’t, then perhaps there is something wrong with the docker config – TBD 
though.  In terms of the error, I don’t think it’s a problem with the 
compilation.  The error comes from: 
https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.0.x/rocclr/hip_global.cpp#L69,
 which is happening because the kernel is not being found by HIP.  I don’t know 
exactly why this is happening yet, but unless you changed the Makefile for 
PageRank I don’t see why it would be the HIP version failing.

More once I can dig further into this.

Matt

From: 1575883782 via gem5-users 
Sent: Sunday, November 6, 2022 11:37 AM
To: The gem5 Users mailing list 
Cc: 1575883782 <1575883...@qq.com>
Subject: [gem5-users] 回复:Re: Gem5 GCN3 (GPUCoalescer detected deadlock when 
running pagerank.)

Hi, Matt

I tried to run pagerank in the develop branch 
(5d0a7b6a6cca0dc20e8b8c366db2ccc150c7480a, Thu Nov 3 16:42:53 2022). But I met 
a new error (details are below).

The error message:
```
/HIP/rocclr/hip_global.cpp:69: guarantee(false && "Cannot find Symbol")
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem occurred: 
fault (General-Protection) detected @ PC (0x76afa941=>0x76afa942).(0=>1)
Memory Usage: 19719528 KBytes
Program aborted at tick 1904281529500
```

It seems the hip version is not correct. I wonder if this problem is because my 
docker image version is old. (I used gcn-gpu v22-0).

The good news is that pagerank runs more instructions and prints more output 
(although it did not run successfully to the end). I am not sure whether it's 
random. But for now, I think it's good news.

Finally, I'm relatively new to gem5 debugging. Could you give some tips about 
debugging the trace? For example, the debug flag (should I use the 
--debug-flags=ProtocolTrace or another accurate flag about GPU?).

Thanks.

-- 原始邮件 --
发件人: "The gem5 Users mailing list" 
mailto:gem5-users@gem5.org>>;
发送时间: 2022年11月6日(星期天) 下午2:15
收件人: "The gem5 Users mailing 
list"mailto:gem5-users@gem5.org>>;
抄送: "1575883782"<1575883...@qq.com>;"Matt 
Sinclair"mailto:sincl...@cs.wisc.edu>>;
主题: [gem5-users] Re: Gem5 GCN3 (GPUCoalescer detected deadlock when running 
pagerank.)

Can you please try the develop branch as well?  While this is good to know it 
doesn’t pass on stable, if develop solves already then that is good to know.

Matt
Sent from my iPhone


On Nov 5, 2022, at 10:51 PM, 1575883782 via gem5-users 
mailto:gem5-users@gem5.org>> wrote:

Thanks. I will try to use `--reg-alloc-policy=dynamic`(I didn't specify a 
specific policy, I just used the default policy). And I will further read the 
trace.
Then, I am using the stable branch. The commit is:
```
commit 39f85b7a3be1ee0ff6e375c9791dd62d23eb8a3e (HEAD -> stable, tag: 
v22.0.0.1, origin/stable, origin/master, origin/HEAD)
Author: Bobby R. Bruce mailto:bbr...@ucdavis.edu>>
Date:   Sat Jun 18 04:59:02 2022 -0700

misc: Update version info to v22.0.0.1
```

-- Original --
From: "The gem5 Users mailing list" 
mailto:gem5-users@gem5.org>>;
Date: Sun, Nov 6, 2022 02:55 AM
To: "The gem5 Users mailing 
list"mailto:gem5-users@gem5.org>>;
Cc: "1575883782"<1575883...@qq.com>;"Matt 
Sinclair"mailto:sincl...@cs.wisc.edu>>;
Subject: [gem5-users] Re: Gem5 GCN3 (GPUCoalescer detected deadlock when 
running pagerank.)

Hi,

Ultimately this message is telling you there is a deadlock in the cache 
coherence protocol when running PageRank with the specifications you did.  To 
fix it, you would need to get a trace 
(https://www.gem5.org/documentation/learning_gem5/part3/MSIdebugging/) and look 
through to see what the problem is.  If you do this and find a fix, we 
definitely welcome any patches you may find to help with this!

Having said that, I’ve been trying to replicate your problem.  However, the 
input size you are running means that gem5 will be running for a while, so it 
will take a while before I can say something more definitive.  We do test 
PageRank as part of the weekly tests, but not specifically for 16 CUs.  What 
branch (stable vs. develop) are you using?  Also, I recommend using 
--reg-alloc-policy=dynamic, as this is a more realistic register allocation 
policy than the simple one (which I can’t tell if you are using or not).  In 
the meantime, if you can answer the above questions, that may help us debug.

Thanks,
Matt

From: 1575883782 via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Saturday, November 5, 2022 3:58 AM
To: gem5-users mailto:gem5-users@gem5.org>>
Cc: 1575883782 

[gem5-users] Re: Gem5 GCN3 (GPUCoalescer detected deadlock when running pagerank.)

2022-11-06 Thread Matt Sinclair via gem5-users
Can you please try the develop branch as well?  While this is good to know it 
doesn’t pass on stable, if develop solves already then that is good to know.

Matt

Sent from my iPhone

On Nov 5, 2022, at 10:51 PM, 1575883782 via gem5-users  
wrote:


Thanks. I will try to use `--reg-alloc-policy=dynamic`(I didn't specify a 
specific policy, I just used the default policy). And I will further read the 
trace.
Then, I am using the stable branch. The commit is:
```
commit 39f85b7a3be1ee0ff6e375c9791dd62d23eb8a3e (HEAD -> stable, tag: 
v22.0.0.1, origin/stable, origin/master, origin/HEAD)
Author: Bobby R. Bruce 
Date:   Sat Jun 18 04:59:02 2022 -0700

misc: Update version info to v22.0.0.1
```

-- Original --
From: "The gem5 Users mailing list" ;
Date: Sun, Nov 6, 2022 02:55 AM
To: "The gem5 Users mailing list";
Cc: "1575883782"<1575883...@qq.com>;"Matt Sinclair";
Subject: [gem5-users] Re: Gem5 GCN3 (GPUCoalescer detected deadlock when 
running pagerank.)

Hi,

Ultimately this message is telling you there is a deadlock in the cache 
coherence protocol when running PageRank with the specifications you did.  To 
fix it, you would need to get a trace 
(https://www.gem5.org/documentation/learning_gem5/part3/MSIdebugging/) and look 
through to see what the problem is.  If you do this and find a fix, we 
definitely welcome any patches you may find to help with this!

Having said that, I’ve been trying to replicate your problem.  However, the 
input size you are running means that gem5 will be running for a while, so it 
will take a while before I can say something more definitive.  We do test 
PageRank as part of the weekly tests, but not specifically for 16 CUs.  What 
branch (stable vs. develop) are you using?  Also, I recommend using 
--reg-alloc-policy=dynamic, as this is a more realistic register allocation 
policy than the simple one (which I can’t tell if you are using or not).  In 
the meantime, if you can answer the above questions, that may help us debug.

Thanks,
Matt

From: 1575883782 via gem5-users 
Sent: Saturday, November 5, 2022 3:58 AM
To: gem5-users 
Cc: 1575883782 <1575883...@qq.com>
Subject: [gem5-users] Gem5 GCN3 (GPUCoalescer detected deadlock when running 
pagerank.)


Hi,



I was trying to run PageRank benchmark with its GCN3 GPU model.

I succeed running PageRank with 4 CUs, but when I run it with 16CUs, I met some 
problems. The key error message is 
"build/GCN3_X86/mem/ruby/system/GPUCoalescer.cc:292: warn: GPUCoalescer 10 
Possible deadlock detected!"

Was I missing something? I don't know how to solve it. Someone could help me?

4CUs command line (default CU number is 4)

```

command line: build/GCN3_X86/gem5.opt -n 3 --mem-size=8GB 
--benchmark-root=/home/ubuntu/lmy/gem5-gcn3/gem5-resources/src/gpu/pannotia -c 
pagerank/bin/pagerank_spmv 
'--options=/home/ubuntu/lmy/gem5-gcn3/gem5-resources/src/gpu/pannotia/pagerank/coAuthorsDBLP.graph
 1'

```



16CUs command line

```

command line: build/GCN3_X86/gem5.opt configs/example/apu_se.py -n 3 
--num-compute-units 16 --mem-size=8GB 
--benchmark-root=/home/ubuntu/lmy/gem5-gcn3/gem5-resources/src/gpu/pannotia -c 
pagerank/bin/pagerank_spmv 
'--options=/home/ubuntu/lmy/gem5-resources/src/gpu/pannotia/pagerank/coAuthorsDBLP.graph
 1'

```

gem5 version

```

gem5 version 22.0.0.1

gem5 compiled Jun 29 2022 10:34:02

gem5 started Nov  3 2022 14:32:39

gem5 executing on 1bcbbec61aaf, pid 1287240

```

Error message:

```

build/GCN3_X86/mem/ruby/system/GPUCoalescer.cc:292: warn: GPUCoalescer 10 
Possible deadlock detected!

Printing out 763 outstanding requests in the coalesced table

Addr: [0x3b8b1c0, line 0x3b8b1c0]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 2

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b300, line 
0x3b8b300]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 3

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b380, line 
0x3b8b380]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 1

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b3c0, line 
0x3b8b3c0]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 3

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b440, line 
0x3b8b440]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 1

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b480, line 
0x3b8b480]

Instruction sequence number: 16871

 

[gem5-users] Re: Gem5 GCN3 (GPUCoalescer detected deadlock when running pagerank.)

2022-11-05 Thread Matt Sinclair via gem5-users
Hi,

Ultimately this message is telling you there is a deadlock in the cache 
coherence protocol when running PageRank with the specifications you did.  To 
fix it, you would need to get a trace 
(https://www.gem5.org/documentation/learning_gem5/part3/MSIdebugging/) and look 
through to see what the problem is.  If you do this and find a fix, we 
definitely welcome any patches you may find to help with this!

Having said that, I've been trying to replicate your problem.  However, the 
input size you are running means that gem5 will be running for a while, so it 
will take a while before I can say something more definitive.  We do test 
PageRank as part of the weekly tests, but not specifically for 16 CUs.  What 
branch (stable vs. develop) are you using?  Also, I recommend using 
--reg-alloc-policy=dynamic, as this is a more realistic register allocation 
policy than the simple one (which I can't tell if you are using or not).  In 
the meantime, if you can answer the above questions, that may help us debug.

Thanks,
Matt

From: 1575883782 via gem5-users 
Sent: Saturday, November 5, 2022 3:58 AM
To: gem5-users 
Cc: 1575883782 <1575883...@qq.com>
Subject: [gem5-users] Gem5 GCN3 (GPUCoalescer detected deadlock when running 
pagerank.)


Hi,



I was trying to run PageRank benchmark with its GCN3 GPU model.

I succeed running PageRank with 4 CUs, but when I run it with 16CUs, I met some 
problems. The key error message is 
"build/GCN3_X86/mem/ruby/system/GPUCoalescer.cc:292: warn: GPUCoalescer 10 
Possible deadlock detected!"

Was I missing something? I don't know how to solve it. Someone could help me?

4CUs command line (default CU number is 4)

```

command line: build/GCN3_X86/gem5.opt -n 3 --mem-size=8GB 
--benchmark-root=/home/ubuntu/lmy/gem5-gcn3/gem5-resources/src/gpu/pannotia -c 
pagerank/bin/pagerank_spmv 
'--options=/home/ubuntu/lmy/gem5-gcn3/gem5-resources/src/gpu/pannotia/pagerank/coAuthorsDBLP.graph
 1'

```



16CUs command line

```

command line: build/GCN3_X86/gem5.opt configs/example/apu_se.py -n 3 
--num-compute-units 16 --mem-size=8GB 
--benchmark-root=/home/ubuntu/lmy/gem5-gcn3/gem5-resources/src/gpu/pannotia -c 
pagerank/bin/pagerank_spmv 
'--options=/home/ubuntu/lmy/gem5-resources/src/gpu/pannotia/pagerank/coAuthorsDBLP.graph
 1'

```

gem5 version

```

gem5 version 22.0.0.1

gem5 compiled Jun 29 2022 10:34:02

gem5 started Nov  3 2022 14:32:39

gem5 executing on 1bcbbec61aaf, pid 1287240

```

Error message:

```

build/GCN3_X86/mem/ruby/system/GPUCoalescer.cc:292: warn: GPUCoalescer 10 
Possible deadlock detected!

Printing out 763 outstanding requests in the coalesced table

Addr: [0x3b8b1c0, line 0x3b8b1c0]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 2

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b300, line 
0x3b8b300]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 3

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b380, line 
0x3b8b380]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 1

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b3c0, line 
0x3b8b3c0]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 3

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b440, line 
0x3b8b440]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 1

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b480, line 
0x3b8b480]

Instruction sequence number: 16871

Type: LD

   Number of associated packets: 2

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b4c0, line 
0x3b8b4c0]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 1

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b540, line 
0x3b8b540]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 1

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b5c0, line 
0x3b8b5c0]

Instruction sequence number: 16871

   Type: LD

   Number of associated packets: 2

   Issue time: 1732620214000

   Difference from current tick: 280298000 Addr: [0x3b8b680, line 
0x3b8b680]

Instruction sequence number: 16871

   

[gem5-users] Re: Error when running test_bwd_bn test

2022-04-12 Thread Matt Sinclair via gem5-users
In general, yes, MIOpen is less optimized for APUs.  I do not recall seeing 
this before for bwd_bn though.  @Kyle Roarty: have you 
seen this?  I'm wondering if something is missing with how we set HIP_PLATFORM 
in the docker?

I did some quick digging and it appears to be coming from here: 
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.x/src/ocl/gcn_asm_utils.cpp#L144

David, what OS are you running this on?  In theory since you're using the 
docker, I wouldn't expect it to matter, but unless Kyle is also seeing this, 
that print appears to be happening when MIOpen doesn't think you are running on 
Linux.  Or that somewhere in your setup you set the compiler to something other 
than HCC/clang/rocclr?  What is HIP_PLATFORM set to in your setup?

Matt

From: David Fong 
Sent: Tuesday, April 12, 2022 10:53 AM
To: Matt Sinclair ; gem5 users mailing list 

Cc: Kyle Roarty ; Poremba, Matthew 
Subject: RE: Error when running test_bwd_bn test

Hi Matt S,

I'm using gfx801.
It proceeds and does not error out.
So it just means it's less optimized running with the APU.
I don't get this message for my other 3 tests (test_fwd_softmax, 
test_bwd_softmax, test_fwd_pool), only for test_bwd_bn.
I guess there's some special  function used in test_bwd_bn that is better 
optimized in GPU.

David



From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Monday, April 11, 2022 6:00 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>; Kyle 
Roarty mailto:kroa...@wisc.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>
Subject: RE: Error when running test_bwd_bn test

Hi David,

My guess is you are using gfx801 for this?  If so, does the application 
actually error out at this point, or just proceed beyond it?  If it's the 
latter, my guess is MIOpen is just complaining that you're running with an APU, 
which is less well optimized for.  If it's the former, then there may be 
something else in your setup we need to check.

Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, April 11, 2022 1:29 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Error when running test_bwd_bn test

Hi,

When I run the DNNMark test_bwd_bn,

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 --gpu-to-dir-latency 
120 --TCC_latency 16 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn
 -c dnnmark_test_bwd_bn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/bn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

I get this error:

MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not 
support AMDGPU. Expect performance degradation.

Does this mean the test will not run properly with AMD GPU and I should ignore 
this test?
Or the AMD CPU will be doing the computations and it means the test will take 
longer to complete?

David

Log for lines before and after the error.
build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring 
non-temporal hint, modeling as cacheable!
build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction 
'frndint' unimplemented
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:704: warn: unimplemented 
ioctl: AMDKFD_IOC_ACQUIRE_VM
build/GCN3_X86/sim/syscall_emul.hh:1862: warn: mmap: writing to shared mmap 
region is currently unsupported. The write succeeds on the target, but it will 
not be propagated to the host or shared mappings
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:455: warn: Signal events are 
only supported currently
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/power_state.cc:105: warn: PowerState: Already in the 
requested power state, request ignored
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall 
set_robust_list(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/syscall_emul.hh:2081: warn: prlimit: unimplemented resource 7
build/GCN3_X86/sim/syscall_emul.hh:2081: warn: prlimit: unimplemented resource 7
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring 

[gem5-users] Re: Error when running test_bwd_bn test

2022-04-11 Thread Matt Sinclair via gem5-users
Hi David,

My guess is you are using gfx801 for this?  If so, does the application 
actually error out at this point, or just proceed beyond it?  If it's the 
latter, my guess is MIOpen is just complaining that you're running with an APU, 
which is less well optimized for.  If it's the former, then there may be 
something else in your setup we need to check.

Matt

From: David Fong via gem5-users 
Sent: Monday, April 11, 2022 1:29 PM
To: gem5 users mailing list 
Cc: David Fong 
Subject: [gem5-users] Error when running test_bwd_bn test

Hi,

When I run the DNNMark test_bwd_bn,

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 --gpu-to-dir-latency 
120 --TCC_latency 16 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn
 -c dnnmark_test_bwd_bn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/bn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

I get this error:

MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not 
support AMDGPU. Expect performance degradation.

Does this mean the test will not run properly with AMD GPU and I should ignore 
this test?
Or the AMD CPU will be doing the computations and it means the test will take 
longer to complete?

David

Log for lines before and after the error.
build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring 
non-temporal hint, modeling as cacheable!
build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction 
'frndint' unimplemented
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:704: warn: unimplemented 
ioctl: AMDKFD_IOC_ACQUIRE_VM
build/GCN3_X86/sim/syscall_emul.hh:1862: warn: mmap: writing to shared mmap 
region is currently unsupported. The write succeeds on the target, but it will 
not be propagated to the host or shared mappings
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:455: warn: Signal events are 
only supported currently
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/power_state.cc:105: warn: PowerState: Already in the 
requested power state, request ignored
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall 
set_robust_list(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/syscall_emul.hh:2081: warn: prlimit: unimplemented resource 7
build/GCN3_X86/sim/syscall_emul.hh:2081: warn: prlimit: unimplemented resource 7
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not 
support AMDGPU. Expect performance degradation.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6


___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: cpu and gpu in gcn3_x86 execute different test programs

2022-04-10 Thread Matt Sinclair via gem5-users
There is a failure you are hitting:

/HIP/rocclr/hip_global.cpp:69: guarantee(false && "Cannot find Symbol") 
___

How are you compiling your code?

Matt

-Original Message-
From: 17861509600--- via gem5-users  
Sent: Sunday, April 10, 2022 9:03 PM
To: gem5-users@gem5.org
Cc: 17861509...@163.com
Subject: [gem5-users] Re: cpu and gpu in gcn3_x86 execute different test 
programs

Hi ,
Below is the output of the program. After the program reaches this place, it 
does not continue to output, and does not stop for a long time.

Thanks,
Lins

gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 21.2.1.0
gem5 compiled Mar 28 2022 08:49:00
gem5 started Apr 11 2022 01:43:41
gem5 executing on 019855eecd51, pid 9
command line: ./build/GCN3_X86/gem5.opt -d /root/gem5/m5out/gcn_x86_401 
configs/example/apu_se2.py 
--benchmark-root=/root/spec06/CPU2006/401.bzip2/run/run_base_ref_amd64-m64-gcc42-nn.
 -c bzip2_base.amd64-m64-gcc42-nn -o 
/root/spec06/CPU2006/401.bzip2/run/run_base_ref_amd64-m64-gcc42-nn./input.combined
 --maxinsts=1 --cpu-type=DerivO3CPU --caches --cacheline_size=64 
--l1d_size=64kB --l1i_size=32kB --l1i_assoc=8 --l1d_assoc=8 --l2cache 
--l2_size=2MB --l2_assoc=8 --l3_size=32MB --l3_assoc=4 --mem-size=4096MB

Num SQC =  1 Num scalar caches =  1 Num CU =  4
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output 
to the main memory is now called `memory_out_port` Global frequency set at 
1 ticks per second
warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
^C
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (4096 Mbytes)
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into 

[gem5-users] Re: cpu and gpu in gcn3_x86 execute different test programs

2022-04-10 Thread Matt Sinclair via gem5-users
Hi,

I personally have never tried running CPU and GPU workloads simultaneously in 
gem5, so I don't have great answers here.  But what exactly is happening now?  
What is the last output you are seeing when you run your workload?

Thanks,
Matt

-Original Message-
From: 17861509600--- via gem5-users  
Sent: Sunday, April 10, 2022 8:37 AM
To: gem5-users@gem5.org
Cc: 17861509...@163.com
Subject: [gem5-users] cpu and gpu in gcn3_x86 execute different test programs

Hello, I use gcn3_x86 in gem5 to test the impact of interference between gpu 
programs and cpu program memory access requests on cpu and gpu performance. By 
modifying apu_se.py, I specify the test program for gpu as square, and the cpu 
test program as 401 of spec2006. I wanted to see if my idea would work, so I 
specified a maximum instruction count of 10. However, the program does 
not stop after execution, and no error is reported. There is no data in the 
output file stats.txt.

apu_se.py:

process = Process(executable = executable, cmd = [args.cmd]
  + args.options.split(), env = env)


#gpu workload
benchmark_path_gpu = ["/root/benchmark/src/gpu/square/bin"]
cmd_gpu = "square"
executable_gpu = find_path(benchmark_path_gpu, cmd_gpu, os.path.exists)

if os.path.isdir(executable_gpu):
benchmark_path_gpu = [executable_gpu]
executable_gpu = find_file(benchmark_path_gpu, cmd_gpu)

process_gpu = Process(executable = executable_gpu, cmd = [cmd_gpu],
  drivers = [gpu_driver, render_driver], env = env)

cpu_list[0].createThreads()
cpu_list[0].workload = process_gpu

for cpu in cpu_list[1:]:
cpu.createThreads()
cpu.workload = process

for cp in cp_list:
cp.workload = host_cpu.workload

if fast_forward:
for i in range(len(future_cpu_list)):
future_cpu_list[i].workload = cpu_list[i].workload
future_cpu_list[i].createThreads()

## Create the overall system  # 
List of CPUs that must be switched when moving between KVM and simulation if 
fast_forward:
switch_cpu_list = \
[(cpu_list[i], future_cpu_list[i]) for i in range(args.num_cpus)]

# Full list of processing cores in the system.
cpu_list = cpu_list + [shader] + cp_list

# creating the overall system
# notice the cpu list is explicitly added as a parameter to System system = 
System(cpu = cpu_list,
mem_ranges = [AddrRange(args.mem_size)],
cache_line_size = args.cacheline_size,
mem_mode = mem_mode,
workload = SEWorkload.init_compatible(executable))

run_gcn3_x86_401.sh:
spec2006path=/root/spec06/CPU2006
outdir=/root/gem5/m5out/gcn_x86_401
bench=401.bzip2
ben_suffix=run/run_base_ref_amd64-m64-gcc42-nn.
exe=bzip2_base.amd64-m64-gcc42-nn
input=input.combined

./build/GCN3_X86/gem5.opt -d $outdir configs/example/apu_se2.py \ 
--benchmark-root=$spec2006path/$bench/$ben_suffix -c $exe \ -o 
"$spec2006path/$bench/$ben_suffix/$input" \
--maxinsts=1 \
--cpu-type=DerivO3CPU \
--caches \
--cacheline_size=64 \
--l1d_size=64kB --l1i_size=32kB --l1i_assoc=8 --l1d_assoc=8 \ --l2cache 
--l2_size=2MB --l2_assoc=8 \ --l3_size=32MB --l3_assoc=4 \ --mem-size=4096MB 
___
gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to 
gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-16 Thread Matt Sinclair via gem5-users
Matt P or Srikant: can you please help David with the latency question?  You 
know the answers better than I do here.

Matt

From: David Fong 
Sent: Wednesday, March 16, 2022 5:47 PM
To: Matt Sinclair ; gem5 users mailing list 

Cc: Kyle Roarty ; Poremba, Matthew 
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt S,

Thanks again for your quick reply with useful information.
I will rerun with -reg-alloc-policy=dynamic
in my mini regression to see If it makes a difference

As for LRN, I won't make modifications to lrn_config.dnnmark
unless it's required to run additional DNN tests.
The 4 tests : test_fwd_softmax, test_bwd_softmax, test_fwd_pool, and 
test_bwd_bn are good enough for now.

For Matt S and Matt P,
Are these parameters for "mem_req_latency" and "mem_resp_latency" valid for 
both APU (Carrizo) and GPU (VEGA) ?
gem5/src/gpu-compute/GPU.py
mem_req_latency = Param.Int(40, "Latency for request from the cu to ruby. "\
"Represents the pipeline to reach the TCP "\
"and specified in GPU clock cycles")
mem_resp_latency = Param.Int(40, "Latency for responses from ruby to the "\
 "cu. Represents the pipeline between the "\
 "TCP and cu as well as TCP data array "\
 "access. Specified in GPU clock cycles")
It seems like to me the GPU (VEGA) with dedicated memory (GDDR5) should be 
using a different parameter for its memory access latencies.
My company's IP could be used to reduce interconnect latencies for the APU and 
GPU and would to quantify this at system level with benchmarks.
We would like to determine if GPU can get performance boost with reduced memory 
access latencies.
Please confirm which memory latencies parameters to modify and use for GPU 
(VEGA).

Thanks,

David


From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Tuesday, March 15, 2022 1:08 PM
To: David Fong mailto:da...@chronostech.com>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi David,

The dynamic register allocation policy allows the GPU to schedule as many 
wavefronts as there is register space on a CU.  By default, the original 
register allocator released with this GPU model ("simple") only allowed 1 
wavefront per CU at a time because the publicly available dependence modeling 
was fairly primitive.  However, this was not very realistic relative to how a 
real GPU performs, so my group has added better dependence tracking support 
(more could probably still be done, but it reduced stalls by up to 42% relative 
to simple) and a register allocation scheme that allows multiple wavefronts to 
run concurrently per CU ("dynamic").

By default, the GPU model assumes that the simple policy is used unless 
otherwise specified.  I have a patch in progress to change that though: 
https://gem5-review.googlesource.com/c/public/gem5/+/57537.

Regardless, if applications are failing with the simple register allocation 
scheme, I wouldn't expect a more complex scheme to fix the issue.  But I do 
strongly recommend you use the dynamic policy for all experiments - otherwise 
you are using a very simple, less realistic GPU model.

Setting all of that aside, I looked up the perror message you sent last night 
and it appears that happens when your physical machine has run out of memory 
(which means we can't do much to fix gem5, since the machine itself wouldn't 
allocate as much memory as you requested).  So, if you want to run LRN and 
can't run on a machine with more memory, one thing you could do is change the 
LRN config file to use smaller NCHW values (e.g., reduce the batch size, N, 
from 100 to something smaller that fits on your machine): 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/lrn_config.dnnmark#6.
  If you do this though, you will likely need to re-run the generate_cachefile 
to generate the MIOpen binaries for this different sized LRN.

Hope this helps,
Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Tuesday, March 15, 2022 2:58 PM
To: Matt Sinclair 

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-15 Thread Matt Sinclair via gem5-users
Hi David,

The dynamic register allocation policy allows the GPU to schedule as many 
wavefronts as there is register space on a CU.  By default, the original 
register allocator released with this GPU model ("simple") only allowed 1 
wavefront per CU at a time because the publicly available dependence modeling 
was fairly primitive.  However, this was not very realistic relative to how a 
real GPU performs, so my group has added better dependence tracking support 
(more could probably still be done, but it reduced stalls by up to 42% relative 
to simple) and a register allocation scheme that allows multiple wavefronts to 
run concurrently per CU ("dynamic").

By default, the GPU model assumes that the simple policy is used unless 
otherwise specified.  I have a patch in progress to change that though: 
https://gem5-review.googlesource.com/c/public/gem5/+/57537.

Regardless, if applications are failing with the simple register allocation 
scheme, I wouldn't expect a more complex scheme to fix the issue.  But I do 
strongly recommend you use the dynamic policy for all experiments - otherwise 
you are using a very simple, less realistic GPU model.

Setting all of that aside, I looked up the perror message you sent last night 
and it appears that happens when your physical machine has run out of memory 
(which means we can't do much to fix gem5, since the machine itself wouldn't 
allocate as much memory as you requested).  So, if you want to run LRN and 
can't run on a machine with more memory, one thing you could do is change the 
LRN config file to use smaller NCHW values (e.g., reduce the batch size, N, 
from 100 to something smaller that fits on your machine): 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/lrn_config.dnnmark#6.
  If you do this though, you will likely need to re-run the generate_cachefile 
to generate the MIOpen binaries for this different sized LRN.

Hope this helps,
Matt

From: David Fong 
Sent: Tuesday, March 15, 2022 2:58 PM
To: Matt Sinclair ; gem5 users mailing list 

Cc: Kyle Roarty ; Poremba, Matthew 
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt S.,

Thanks for the detailed reply.

I looked at the link you sent me for the weekly run.

I see an additional parameter which I didn't use:

--reg-alloc-policy=dynamic

What does this do ?

I was able to run the two other tests you use in your weekly runs : 
test_fwd_pool, test_bwd_bn
for CUs=4.

David


From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Monday, March 14, 2022 7:41 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>; Kyle 
Roarty mailto:kroa...@wisc.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi David,

I have not seen this mmap error before, and my initial guess was the mmap error 
is happening because you are trying to allocate more memory than we created 
when mmap'ing the inputs for the applications (we do this to speed up SE mode, 
because otherwise initializing arrays can take several hours).  However, the 
fact that it is failing in physical.cc and not in the application itself is 
throwing me off there.  Looking at where the failure is occurring, it seems the 
backing store code itself is failing here (from such a large allocation).  
Since the failure is with a C++ mmap call itself, that is perhaps more 
problematic - is "Cannot allocate memory" the failure from the perror() call on 
the line above the fatal() print?

Regarding the other question, and the failures more generally: we have never 
tested with > 64 CUs before, so certainly you are stressing the system and 
encountering different kinds of failures than we have seen previously.

In terms of applications, I had thought most/all of them passed previously, but 
we do not test each and every one all the time because this would make our 
weekly regressions run for a very long time.  You can see here: 
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/tests/weekly.sh#176
 which ones we run on a weekly basis.  I expect all of those to pass (although 
your comment seems to indicate that is not always true?).  Your issues are 
exposing that perhaps we need to test more of them beyond these 3 - perhaps on 
a quarterly basis or something though to avoid inflating the weekly runtime.  
Having said that, I have not run LRN in a long time, as some ML people told me 
that LRN was not widely used anymore.  But when I did run it, I do remember it 
requiring a large amount of memory - which squares with what you are seeing 
here.  I thought 

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-14 Thread Matt Sinclair via gem5-users
Hi David,

I have not seen this mmap error before, and my initial guess was the mmap error 
is happening because you are trying to allocate more memory than we created 
when mmap'ing the inputs for the applications (we do this to speed up SE mode, 
because otherwise initializing arrays can take several hours).  However, the 
fact that it is failing in physical.cc and not in the application itself is 
throwing me off there.  Looking at where the failure is occurring, it seems the 
backing store code itself is failing here (from such a large allocation).  
Since the failure is with a C++ mmap call itself, that is perhaps more 
problematic - is "Cannot allocate memory" the failure from the perror() call on 
the line above the fatal() print?

Regarding the other question, and the failures more generally: we have never 
tested with > 64 CUs before, so certainly you are stressing the system and 
encountering different kinds of failures than we have seen previously.

In terms of applications, I had thought most/all of them passed previously, but 
we do not test each and every one all the time because this would make our 
weekly regressions run for a very long time.  You can see here: 
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/tests/weekly.sh#176
 which ones we run on a weekly basis.  I expect all of those to pass (although 
your comment seems to indicate that is not always true?).  Your issues are 
exposing that perhaps we need to test more of them beyond these 3 - perhaps on 
a quarterly basis or something though to avoid inflating the weekly runtime.  
Having said that, I have not run LRN in a long time, as some ML people told me 
that LRN was not widely used anymore.  But when I did run it, I do remember it 
requiring a large amount of memory - which squares with what you are seeing 
here.  I thought LRN needed -mem-size=32 GB to run, but based on your message 
it seems that is not the case.

@Matt P: have you tried LRN lately?  If so, have you run into the same 
OOM/backing store failures?

I know Kyle R. is looking into your other failure, so this one may have to wait 
behind it from our end, unless Matt P knows of a fix.

Thanks,
Matt

From: David Fong via gem5-users 
Sent: Monday, March 14, 2022 4:38 PM
To: David Fong via gem5-users 
Cc: David Fong 
Subject: [gem5-users] gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi,

I'm getting an error related to memory for test_fwd_lrn
I increased the memory size from 4GB to 512GB I got memory size issue : "out of 
memory".

build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/mem_pool.cc:120: fatal: fatal condition freePages() <= 0 
occurred: Out of memory, please increase size of physical memory.

But once I increased mem size to 1024GB, 1536GB,2048GB I'm getting this DRAM 
device capacity issue.

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --mem-size 1536GB --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_lrn
 -cdnnmark_test_fwd_lrn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/lrn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_gpu_cu256_run_dnnmark_test_fwd_lrn_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (2097152 Mbytes)
mmap: Cannot allocate memory
build/GCN3_X86/mem/physical.cc:231: fatal: Could not mmap 1649267441664 bytes 
for range [0:0x180]!


Smaller number of CUs like 4 also have same type of error.

Is there a regression script or regression log for DNNMark to show mem-size or 
configurations that are known working for DNNMark tests so
I can use same setup to run a few DNNMark tests?
Only test_fwd_softmax, test_bwd_softmax are working for CUs from 
{4,8,16,32,64,128,256}

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

2022-03-11 Thread Matt Sinclair via gem5-users
 6
. . .
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: 
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801100.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid 
filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid 
filter channel number
MIOpen Error: 3 at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
 264369621500
Exiting because  exiting with last active thread context


From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Thursday, March 10, 2022 6:02 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>; Kyle 
Roarty mailto:kroa...@wisc.edu>>; Matthew Poremba 
mailto:matthew.pore...@amd.com>>
Subject: Re: [gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

Just to be clear: —mem-size is an input arg for the apu_se.py script.

Matt
Sent from my iPhone

On Mar 10, 2022, at 7:44 PM, Matt Sinclair via gem5-users 
mailto:gem5-users@gem5.org>> wrote:
 I am on my phone and thus cannot easily look at the line that failed at the 
moment, but my first step would be to increase the size of the memory gem5 is 
assuming — try —mem-size=8GB or 16GB and let us know if that solves the problem.

Matt
Sent from my iPhone

On Mar 10, 2022, at 5:12 PM, David Fong via gem5-users 
mailto:gem5-users@gem5.org>> wrote:

Hi,

I’m trying to run test_fwd_conv for gem5 with X86 CPU and GCN3 (gfx801) APU 
with 256 CU using git with gem5 v21.2.1.0

Linux> cd gem5/gem5-resources/src/gpu/DNNMark
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
Linux> docker run --rm -v ${PWD}:${PWD} 
-v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py cachefiles.csv 
--gfx-version=gfx801 --num-cus=256
Linux> mv gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801_256.ukdb 
gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801100.ukdb

Linux> vim gem5/build_opts/GCN3_X86
NUMBER_BITS_PER_SET = '256'

Linux> cd gem5
Linxu> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt

Linux> cd ../../../../

linux> docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

An error message occurred for the test:
HIP Error at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/data_manager.h49
hipErrorOutOfMemory

How to fix this error ?

David

MESSAGES SHORTENED
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] in

[gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

2022-03-11 Thread Matt Sinclair via gem5-users
6/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid 
filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid 
filter channel number
MIOpen Error: 3 at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
 264369621500
Exiting because  exiting with last active thread context


From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Thursday, March 10, 2022 6:02 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>; Kyle 
Roarty mailto:kroa...@wisc.edu>>; Matthew Poremba 
mailto:matthew.pore...@amd.com>>
Subject: Re: [gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

Just to be clear: —mem-size is an input arg for the apu_se.py script.

Matt
Sent from my iPhone

On Mar 10, 2022, at 7:44 PM, Matt Sinclair via gem5-users 
mailto:gem5-users@gem5.org>> wrote:
 I am on my phone and thus cannot easily look at the line that failed at the 
moment, but my first step would be to increase the size of the memory gem5 is 
assuming — try —mem-size=8GB or 16GB and let us know if that solves the problem.

Matt
Sent from my iPhone

On Mar 10, 2022, at 5:12 PM, David Fong via gem5-users 
mailto:gem5-users@gem5.org>> wrote:

Hi,

I’m trying to run test_fwd_conv for gem5 with X86 CPU and GCN3 (gfx801) APU 
with 256 CU using git with gem5 v21.2.1.0

Linux> cd gem5/gem5-resources/src/gpu/DNNMark
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
Linux> docker run --rm -v ${PWD}:${PWD} 
-v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py cachefiles.csv 
--gfx-version=gfx801 --num-cus=256
Linux> mv gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801_256.ukdb 
gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801100.ukdb

Linux> vim gem5/build_opts/GCN3_X86
NUMBER_BITS_PER_SET = '256'

Linux> cd gem5
Linxu> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt

Linux> cd ../../../../

linux> docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

An error message occurred for the test:
HIP Error at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/data_manager.h49
hipErrorOutOfMemory

How to fix this error ?

David

MESSAGES SHORTENED
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
build/GCN3_X86/base/remote_gdb.cc:381: warn: Sockets disabled, not accepting 
gdb connections
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output 
to the main memory is now called `memory_out_port`
warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
warn: failed to generate dot output from m5out/config.dot
build/GCN3_X86/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting 
simulation...
build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy 
updates recently became the responsibility of SLICC state machines. Make sure 
to setMRU() near callbacks in .sm files!
gem5 Simulator System.  
http://gem5.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__gem5.org=DwMGaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc=7r3w2XxzFbRgIeC6

[gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

2022-03-10 Thread Matt Sinclair via gem5-users
Just to be clear: —mem-size is an input arg for the apu_se.py script.

Matt

Sent from my iPhone

On Mar 10, 2022, at 7:44 PM, Matt Sinclair via gem5-users  
wrote:

 I am on my phone and thus cannot easily look at the line that failed at the 
moment, but my first step would be to increase the size of the memory gem5 is 
assuming — try —mem-size=8GB or 16GB and let us know if that solves the problem.

Matt

Sent from my iPhone

On Mar 10, 2022, at 5:12 PM, David Fong via gem5-users  
wrote:


Hi,

I’m trying to run test_fwd_conv for gem5 with X86 CPU and GCN3 (gfx801) APU 
with 256 CU using git with gem5 v21.2.1.0

Linux> cd gem5/gem5-resources/src/gpu/DNNMark
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
Linux> docker run --rm -v ${PWD}:${PWD} 
-v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py cachefiles.csv 
--gfx-version=gfx801 --num-cus=256
Linux> mv gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801_256.ukdb 
gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801100.ukdb

Linux> vim gem5/build_opts/GCN3_X86
NUMBER_BITS_PER_SET = '256'

Linux> cd gem5
Linxu> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt

Linux> cd ../../../../

linux> docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

An error message occurred for the test:
HIP Error at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/data_manager.h49
hipErrorOutOfMemory

How to fix this error ?

David

MESSAGES SHORTENED
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
build/GCN3_X86/base/remote_gdb.cc:381: warn: Sockets disabled, not accepting 
gdb connections
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output 
to the main memory is now called `memory_out_port`
warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
warn: failed to generate dot output from m5out/config.dot
build/GCN3_X86/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting 
simulation...
build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy 
updates recently became the responsibility of SLICC state machines. Make sure 
to setMRU() near callbacks in .sm files!
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 21.2.1.0
gem5 compiled Mar 10 2022 21:44:19
gem5 started Mar 10 2022 22:25:08
gem5 executing on 84084e0cba7d, pid 1
command line: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py 
--num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv '--options=-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin'

info: Standard input is not a terminal, disabling listeners.
Num SQC =  64 Num scalar caches =  64 Num CU =  256
incrementing idx on  4
incrementing idx on  8
incrementing idx on  12
. . .
incrementing idx on  248
incrementing idx on  252
"dot" with args ['-Tsvg', '/tmp/tmp7b3e5gva'] returned code: 1

stdout, stderr:
b''
b'Error: /tmp/tmp7b3e5gva: syntax error in line 236909 scanning a quoted string 
(missing endquote? longer than 16384?)\nString 
starting:"clk_domainsystem.ruby.clk_domain\\eventq_index0\\latency1\n'

build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
. . .
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring sysc

[gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

2022-03-10 Thread Matt Sinclair via gem5-users
I am on my phone and thus cannot easily look at the line that failed at the 
moment, but my first step would be to increase the size of the memory gem5 is 
assuming — try —mem-size=8GB or 16GB and let us know if that solves the problem.

Matt

Sent from my iPhone

On Mar 10, 2022, at 5:12 PM, David Fong via gem5-users  
wrote:


Hi,

I’m trying to run test_fwd_conv for gem5 with X86 CPU and GCN3 (gfx801) APU 
with 256 CU using git with gem5 v21.2.1.0

Linux> cd gem5/gem5-resources/src/gpu/DNNMark
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
Linux> docker run --rm -v ${PWD}:${PWD} 
-v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py cachefiles.csv 
--gfx-version=gfx801 --num-cus=256
Linux> mv gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801_256.ukdb 
gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801100.ukdb

Linux> vim gem5/build_opts/GCN3_X86
NUMBER_BITS_PER_SET = '256'

Linux> cd gem5
Linxu> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt

Linux> cd ../../../../

linux> docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

An error message occurred for the test:
HIP Error at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/data_manager.h49
hipErrorOutOfMemory

How to fix this error ?

David

MESSAGES SHORTENED
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
build/GCN3_X86/base/remote_gdb.cc:381: warn: Sockets disabled, not accepting 
gdb connections
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output 
to the main memory is now called `memory_out_port`
warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
warn: failed to generate dot output from m5out/config.dot
build/GCN3_X86/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting 
simulation...
build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy 
updates recently became the responsibility of SLICC state machines. Make sure 
to setMRU() near callbacks in .sm files!
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 21.2.1.0
gem5 compiled Mar 10 2022 21:44:19
gem5 started Mar 10 2022 22:25:08
gem5 executing on 84084e0cba7d, pid 1
command line: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py 
--num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv '--options=-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin'

info: Standard input is not a terminal, disabling listeners.
Num SQC =  64 Num scalar caches =  64 Num CU =  256
incrementing idx on  4
incrementing idx on  8
incrementing idx on  12
. . .
incrementing idx on  248
incrementing idx on  252
"dot" with args ['-Tsvg', '/tmp/tmp7b3e5gva'] returned code: 1

stdout, stderr:
b''
b'Error: /tmp/tmp7b3e5gva: syntax error in line 236909 scanning a quoted string 
(missing endquote? longer than 16384?)\nString 
starting:"clk_domainsystem.ruby.clk_domain\\eventq_index0\\latency1\n'

build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
. . .
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring 

[gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark test_fwd_softmax

2022-03-09 Thread Matt Sinclair via gem5-users
Thanks Kyle!  Should we add a patch to address this then?

Matt

From: Kyle Roarty 
Sent: Wednesday, March 9, 2022 5:06 PM
To: David Fong ; Matt Sinclair ; 
gem5 users mailing list ; Poremba, Matthew 

Subject: Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

For whatever reason, MIOpen looks for a different filename when the number of 
CUs is above 100. However, we didn't see this because we never tested with such 
a large number of CUs.

If you look in the cachfiles directory in the DNNMark folder, you'll see a 
couple of relevant files: gfx801_128.udkb and gfx80180.ukdb. Rename 
gfx801_128.udkb to gfx80180.udkb and it'll work.

Kyle

From: David Fong 
Sent: Wednesday, March 9, 2022 3:42 PM
To: Matt Sinclair ; gem5 users mailing list 
; Poremba, Matthew ; Kyle Roarty 

Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax


Nothing gets echoed out to the screen when I run this cmd-line with the 
–num-cus=128



docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128



Is there some option to make it verbose ?



From: Matt Sinclair 
Sent: Wednesday, March 9, 2022 1:36 PM
To: David Fong ; gem5 users mailing list 
; Poremba, Matthew ; Kyle Roarty 

Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax



@Kyle Roarty: I believe the only way to check that the 
number was substituted in is to watch the terminal when it’s run, is that right?



I am not aware of 128 CUs not being supported, but I also haven’t tried that 
many before either.



Matt



From: David Fong mailto:da...@chronostech.com>>
Sent: Wednesday, March 9, 2022 3:32 PM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>; Poremba, 
Matthew mailto:matthew.pore...@amd.com>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax



Hi Matt,



I used these command-line for generating the cachefiles.



gem5/gem5-resources/src/gpu/DNNMark/



docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP

docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make

docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128



Maybe the option for –num-cus=128 is NOT supported ?



How to confirm the –num-cus=128 is updated in some file(s) ?



Thanks,



David





From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Wednesday, March 9, 2022 1:13 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Kyle Roarty mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax



That error in #2 means MIOpen can’t find the kernel again.  Did you change the 
number of CUs to 128 (or whatever number of CUs you are using) when you 
generated the cachefiles?



Matt



From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Wednesday, March 9, 2022 12:50 PM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with 
DNNMark test_fwd_softmax



Hi Matt,



Thanks for your quick response.

The hack is not working.

  1.  I had to start from scratch or I get same error
  2.  After running the same steps + the hack before gem5 compile, I’m getting 
these error messages

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)

sh: 1: Cannot fork

MIOpen Error: /root/driver/MLOpen/src/hipoc/hipoc_program.cpp:195: Cant find 
file: /tmp/miopen-MIOpenSoftmax.cl-96e7-d3d7-ce59-9759/MIOpenSoftmax.cl.o

MIOpen Error: 7 at 
/home/dfong/work/ext_ips/gem5-apu-cu128-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_wrapper.h485Ticks:
 574458882500



Am I missing some other setting ?



David



FULL MESSAGE WITH . . . TO REDUCE SIZE



docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 

[gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark test_fwd_softmax

2022-03-09 Thread Matt Sinclair via gem5-users
@Kyle Roarty: I believe the only way to check that the 
number was substituted in is to watch the terminal when it's run, is that right?

I am not aware of 128 CUs not being supported, but I also haven't tried that 
many before either.

Matt

From: David Fong 
Sent: Wednesday, March 9, 2022 3:32 PM
To: Matt Sinclair ; gem5 users mailing list 
; Poremba, Matthew ; Kyle Roarty 

Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

Hi Matt,

I used these command-line for generating the cachefiles.

gem5/gem5-resources/src/gpu/DNNMark/

docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128

Maybe the option for -num-cus=128 is NOT supported ?

How to confirm the -num-cus=128 is updated in some file(s) ?

Thanks,

David


From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Wednesday, March 9, 2022 1:13 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Kyle Roarty mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

That error in #2 means MIOpen can't find the kernel again.  Did you change the 
number of CUs to 128 (or whatever number of CUs you are using) when you 
generated the cachefiles?

Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Wednesday, March 9, 2022 12:50 PM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with 
DNNMark test_fwd_softmax

Hi Matt,

Thanks for your quick response.
The hack is not working.

  1.  I had to start from scratch or I get same error
  2.  After running the same steps + the hack before gem5 compile, I'm getting 
these error messages
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen Error: /root/driver/MLOpen/src/hipoc/hipoc_program.cpp:195: Cant find 
file: /tmp/miopen-MIOpenSoftmax.cl-96e7-d3d7-ce59-9759/MIOpenSoftmax.cl.o
MIOpen Error: 7 at 
/home/dfong/work/ext_ips/gem5-apu-cu128-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_wrapper.h485Ticks:
 574458882500

Am I missing some other setting ?

David

FULL MESSAGE WITH . . . TO REDUCE SIZE

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_apu_cu128_run_dnnmark_test_fwd_softmax_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .

[gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark test_fwd_softmax

2022-03-09 Thread Matt Sinclair via gem5-users
That error in #2 means MIOpen can't find the kernel again.  Did you change the 
number of CUs to 128 (or whatever number of CUs you are using) when you 
generated the cachefiles?

Matt

From: David Fong via gem5-users 
Sent: Wednesday, March 9, 2022 12:50 PM
To: Poremba, Matthew ; gem5 users mailing list 

Cc: David Fong 
Subject: [gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with 
DNNMark test_fwd_softmax

Hi Matt,

Thanks for your quick response.
The hack is not working.

  1.  I had to start from scratch or I get same error
  2.  After running the same steps + the hack before gem5 compile, I'm getting 
these error messages
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen Error: /root/driver/MLOpen/src/hipoc/hipoc_program.cpp:195: Cant find 
file: /tmp/miopen-MIOpenSoftmax.cl-96e7-d3d7-ce59-9759/MIOpenSoftmax.cl.o
MIOpen Error: 7 at 
/home/dfong/work/ext_ips/gem5-apu-cu128-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_wrapper.h485Ticks:
 574458882500

Am I missing some other setting ?

David

FULL MESSAGE WITH . . . TO REDUCE SIZE

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_apu_cu128_run_dnnmark_test_fwd_softmax_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
build/GCN3_X86/base/remote_gdb.cc:381: warn: Sockets disabled, not accepting 
gdb connections
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output 
to the main memory is now called `memory_out_port`
warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
warn: failed to generate dot output from m5out/config.dot
build/GCN3_X86/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting 
simulation...
build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy 
updates recently became the responsibility of SLICC state machines. Make sure 
to setMRU() near callbacks in .sm files!
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 21.2.1.0
gem5 compiled Mar  9 2022 18:21:02
gem5 started Mar  9 2022 18:27:12
gem5 executing on dc013b3a89f5, pid 1
command line: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py 
--num-compute-units 128 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax '--options=-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin'

info: Standard input is not a terminal, disabling listeners.
Num SQC =  32 Num scalar caches =  32 Num CU =  128
incrementing idx on  4
incrementing idx on  8
incrementing idx on  12
incrementing idx on  16
incrementing idx on  20
incrementing idx on  24
incrementing idx on  28
incrementing idx on  32
incrementing idx on  36
incrementing 

[gem5-users] Re: gem5 : x86 + VEGA DGPU (gfx900) with test_fwd_conv error

2022-03-07 Thread Matt Sinclair via gem5-users
Kyle can you please take a look at this?  Seems fwd_conv is broken with Vega 
from my reading of the output (which I was not aware of).  But since we aren't 
testing Vega yet, it's perhaps not surprising something broke.

David, in the meantime (if possible for your work) I would encourage you to use 
GCN3 as that is much more stable (and tested).

Matt

From: David Fong via gem5-users 
Sent: Monday, March 7, 2022 1:01 PM
To: David Fong via gem5-users 
Cc: David Fong 
Subject: [gem5-users] gem5 : x86 + VEGA DGPU (gfx900) with test_fwd_conv error


Hi,



I’m trying to run DNNMark with x86 + VEGA DGPU (gfx900) with test_fwd_conv.



I’m getting this warning and error.

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: 
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx900_4.HIP.fdb.txt

MIOpen Error: 3 at 
/home/dfong/work/ext_ips/gem5-vega-gpu-dnn1/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057



Is there something wrong in my cmd-line to run the test_fwd_conv test or do I 
need to update a file ?



David



docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/VEGA_X86/gem5.opt 
gem5/configs/example/apu_se.py --mem-size 4GB --dgpu --gfx-version=gfx900 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_vega_dgpu_run_dnnmark_test_fwd_conv_40latency_0307.log

Global frequency set at 1 ticks per second

build/VEGA_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (4096 Mbytes)

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.

build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] 

[gem5-users] Re: gem5 + APU latency numbers

2022-03-07 Thread Matt Sinclair via gem5-users
I think Srikant's other reply addressed this?

Matt

From: David Fong 
Sent: Monday, March 7, 2022 11:12 AM
To: Poremba, Matthew ; David Fong via gem5-users 
; Bharadwaj, Srikant 
Cc: Bobby Bruce ; Matt Sinclair 
Subject: gem5 + APU latency numbers


Hi Matt P.,



I notice these stat numbers in the overall number for cpu3 (APU).

For 40, overall cpu3 (APU) latency numbers are reduced but shaderActiveTicks 
increased.

Do these numbers make sense?



David



Modified:

gem5/build/GCN3_X86/gpu-compute/GPU.py



mem_req_latency = Param.Int(40, "Latency for request from the cu to ruby. "\

"Represents the pipeline to reach the TCP "\

"and specified in GPU clock cycles")

mem_resp_latency = Param.Int(40, "Latency for responses from ruby to the "\

 "cu. Represents the pipeline between the "\

 "TCP and cu as well as TCP data array "\

 "access. Specified in GPU clock cycles")



m5out/stats.txt



40 (mem_req_latency, mem_resp_latency) (smaller is better)

system.cpu3.allLatencyDist::mean 458572.656250   # 
delay distribution for all (Unspecified)

system.cpu3.allLatencyDist::stdev429452.145064   # 
delay distribution for all (Unspecified)



50 (mem_req_latency, mem_resp_latency)

system.cpu3.allLatencyDist::mean 491744.531250   # 
delay distribution for all (Unspecified)

system.cpu3.allLatencyDist::stdev439992.936927   # 
delay distribution for all (Unspecified)



Latency is reduced for mean and stdev.



40 (mem_req_latency, mem_resp_latency) (smaller is better)

system.cpu3.allLatencyDist::overflows  97  1.52%100.00% # 
delay distribution for all (Unspecified)

system.cpu3.allLatencyDist::min_value   84000   # 
delay distribution for all (Unspecified)

system.cpu3.allLatencyDist::max_value 3796000   # 
delay distribution for all (Unspecified)



50 (mem_req_latency, mem_resp_latency)

system.cpu3.allLatencyDist::overflows  125  1.95%100.00% # 
delay distribution for all (Unspecified)

system.cpu3.allLatencyDist::min_value  104000   # 
delay distribution for all (Unspecified)

system.cpu3.allLatencyDist::max_value 2651000   # 
delay distribution for all (Unspecified)



40 (mem_req_latency, mem_resp_latency) (larger is better ??)

system.cpu3.shaderActiveTicks   17236   # 
Total ticks that any CU attached to this shader is active (Unspecified)



50 (mem_req_latency, mem_resp_latency)

system.cpu3.shaderActiveTicks   171038999   # 
Total ticks that any CU attached to this shader is active (Unspecified)






___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Gem5 GCN3 DNNMark benchmark error (fwd_softmax is ok, but others are not)

2022-02-12 Thread Matt Sinclair via gem5-users
Thanks this is helpful.  Kyle and I went through the error and we haven't
run on a machine with enough memory to run batch size 100 (which is what
bwd_activation assumes by default).  However, we have gotten it to run with
up to batch size 50.

We think the failure you were seeing was essentially happening because we
weren't testing bwd_activation in the nightly/weekly regressions, and thus
missed that the file we use to generate the MIOpen cachefiles for the
DNNMark kernels did not have the appropriate kernel for bwd_activation.
Kyle created a patch to fix this problem:
https://gem5-review.googlesource.com/c/public/gem5-resources/+/56789.

You will need to pull this patch and rerun generate_cachefiles before
trying to run again.  Moreover, since we only know it works up to batch
size 50, you may consider changing the batch size here:
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/DNNMark/config_example/activation_config.dnnmark#6,
to something <= 50 since N represents the batch size.  Alternatively if you
need > 50 batch size, you can try running again on the larger machine you
mentioned before, but since we haven't run it on such a large machine yet
we don't know exactly what will happen.

Hope this helps,
Matt

On Fri, Feb 11, 2022 at 12:11 PM 1575883782 via gem5-users <
gem5-users@gem5.org> wrote:

> yeah, I running DNNMark inside docker, and the version is v21-2. I run
> command by remote-container plugin of VsCode.
>
> ---Original---
> *From:* "Matt Sinclair via gem5-users"
> *Date:* Sat, Feb 12, 2022 01:41 AM
> *To:* "gem5 users mailing list";
> *Cc:* "1575883782"<1575883...@qq.com>;"Kyle Roarty";"Matt
> Sinclair";
> *Subject:* [gem5-users] Re: Gem5 GCN3 DNNMark benchmark error
> (fwd_softmax is ok, but others are not)
>
> One more question for you, original poster: are you running DNNMark inside
> the docker resources we provided:
> http://resources.gem5.org/resources/dnn-mark?
>
> Or are you trying to get this running on your machine directly?
>
> Matt
>
> On Fri, Feb 11, 2022 at 11:37 AM Matt Sinclair <
> mattdsinclair.w...@gmail.com> wrote:
>
>> Kyle, can you please help with this?  I don't recall when we last tested
>> bwd_act.
>>
>> Matt
>>
>> On Fri, Feb 11, 2022 at 2:18 AM 1575883782 via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hi,
>>>
>>> I was trying to run DNNMark benchmark with its GCN3 GPU model following the 
>>> instructions
>>> on http://resources.gem5.org/resources/dnn-mark 
>>> <https://www.gem5.org/documentation/general_docs/gpu_models/GCN3>.
>>>
>>> I succeed running fwd_softmax, but when I run other layers, I met some 
>>> problems. For example, "bwd_activation".
>>>
>>>
>>> I tried to run gem5 DNNMark bwd_activation bechmark in 2 computers.
>>>
>>>
>>> First computer has 32G Mem size. Gem5 could run fwd_softmax successfully, 
>>> but always was killed while running bwd_activation. The error message was 
>>> "Killed" + process id. No other messages. I guess it's as this computer's 
>>> mem size is not enough to run it.
>>>
>>>
>>> Second computer has 256G Mem size. Gem5 could run fwd_softmax successfully. 
>>> But some problems happened while running bwd_activation. I solved some, but 
>>> have not solved all. Error messages are:
>>>
>>>
>>> > I0909 01:46:50.680040   100 dnn_wrapper.h:341] enter 
>>> > dnnmarkActivationBackward func
>>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>>> > build/GCN3_X86/arch/x86/faults.cc:170: panic: Tried to read unmapped 
>>> > address 0.
>>> > PC: 0x7fffeef84b80, Instr:   FMUL2_M : ldfp87   %ufp1, DS:[rdx]
>>> > Memory Usage: 46436124 KBytes
>>> > Program aborted at tick 10680071080500
>>> >
>>>
>>>
>>> sometimes, error are:
>>>
>>> > panic: Tried to write unmapped address 0x2b95d881.
>>>
>>> or
>>>
>>> > panic: Tried to write unmapped address 0x3.
>>>
>>>
>>> According to my log, I found the problem happended on 
>>> "dnnmarkActivationBackward" func.
>>>
>>> > LOG(INFO) << "enter dnnmarkActivationBackward func"

[gem5-users] Re: Gem5 GCN3 DNNMark benchmark error (fwd_softmax is ok, but others are not)

2022-02-11 Thread Matt Sinclair via gem5-users
One more question for you, original poster: are you running DNNMark inside
the docker resources we provided:
http://resources.gem5.org/resources/dnn-mark?

Or are you trying to get this running on your machine directly?

Matt

On Fri, Feb 11, 2022 at 11:37 AM Matt Sinclair 
wrote:

> Kyle, can you please help with this?  I don't recall when we last tested
> bwd_act.
>
> Matt
>
> On Fri, Feb 11, 2022 at 2:18 AM 1575883782 via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> Hi,
>>
>> I was trying to run DNNMark benchmark with its GCN3 GPU model following the 
>> instructions
>> on http://resources.gem5.org/resources/dnn-mark 
>> .
>>
>> I succeed running fwd_softmax, but when I run other layers, I met some 
>> problems. For example, "bwd_activation".
>>
>>
>> I tried to run gem5 DNNMark bwd_activation bechmark in 2 computers.
>>
>>
>> First computer has 32G Mem size. Gem5 could run fwd_softmax successfully, 
>> but always was killed while running bwd_activation. The error message was 
>> "Killed" + process id. No other messages. I guess it's as this computer's 
>> mem size is not enough to run it.
>>
>>
>> Second computer has 256G Mem size. Gem5 could run fwd_softmax successfully. 
>> But some problems happened while running bwd_activation. I solved some, but 
>> have not solved all. Error messages are:
>>
>>
>> > I0909 01:46:50.680040   100 dnn_wrapper.h:341] enter 
>> > dnnmarkActivationBackward func
>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>> > build/GCN3_X86/arch/x86/faults.cc:170: panic: Tried to read unmapped 
>> > address 0.
>> > PC: 0x7fffeef84b80, Instr:   FMUL2_M : ldfp87   %ufp1, DS:[rdx]
>> > Memory Usage: 46436124 KBytes
>> > Program aborted at tick 10680071080500
>> >
>>
>>
>> sometimes, error are:
>>
>> > panic: Tried to write unmapped address 0x2b95d881.
>>
>> or
>>
>> > panic: Tried to write unmapped address 0x3.
>>
>>
>> According to my log, I found the problem happended on 
>> "dnnmarkActivationBackward" func.
>>
>> > LOG(INFO) << "enter dnnmarkActivationBackward func";
>> > #ifdef AMD_MIOPEN
>> >   MIOPEN_CALL(miopenActivationBackward(
>> >   mode == COMPOSED ?
>> >   handle.GetMIOpen(idx) : handle.GetMIOpen(),
>> >   activation_desc.Get(),
>> >   alpha,
>> >   top_desc.Get(), y,
>> >   top_desc.Get(), dy,
>> >   bottom_desc.Get(), x,
>> >   beta,
>> >   bottom_desc.Get(), dx));
>> > #endif
>> >   LOG(INFO) << "exit dnnmarkActivationBackward func";
>>
>>
>> It seems to be a miopen interface functions. I don't know how to solve it. 
>> Someone could help me?
>>
>>
>> PS:
>>
>> my gem5 version is v21-2, and docker image is v21-2.
>>
>> my run command is: build/GCN3_X86/gem5.opt --outdir=$outdir 
>> configs/example/apu_se.py -n 10 --mem-size=8GB 
>> --benchmark-root=$BenchmarkRoot/test_bwd_activation -c 
>> dnnmark_test_bwd_activation --options="-config 
>> "$ConfigRoot"/activation_config.dnnmark -mmap "$MMAPFile" -debuginfo 1"
>>
>> Both computers have no AMD GPU.
>>
>> ___
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Gem5 GCN3 DNNMark benchmark error (fwd_softmax is ok, but others are not)

2022-02-11 Thread Matt Sinclair via gem5-users
Kyle, can you please help with this?  I don't recall when we last tested
bwd_act.

Matt

On Fri, Feb 11, 2022 at 2:18 AM 1575883782 via gem5-users <
gem5-users@gem5.org> wrote:

> Hi,
>
> I was trying to run DNNMark benchmark with its GCN3 GPU model following the 
> instructions
> on http://resources.gem5.org/resources/dnn-mark 
> .
>
> I succeed running fwd_softmax, but when I run other layers, I met some 
> problems. For example, "bwd_activation".
>
>
> I tried to run gem5 DNNMark bwd_activation bechmark in 2 computers.
>
>
> First computer has 32G Mem size. Gem5 could run fwd_softmax successfully, but 
> always was killed while running bwd_activation. The error message was 
> "Killed" + process id. No other messages. I guess it's as this computer's mem 
> size is not enough to run it.
>
>
> Second computer has 256G Mem size. Gem5 could run fwd_softmax successfully. 
> But some problems happened while running bwd_activation. I solved some, but 
> have not solved all. Error messages are:
>
>
> > I0909 01:46:50.680040   100 dnn_wrapper.h:341] enter 
> > dnnmarkActivationBackward func
> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
> > build/GCN3_X86/arch/x86/faults.cc:170: panic: Tried to read unmapped 
> > address 0.
> > PC: 0x7fffeef84b80, Instr:   FMUL2_M : ldfp87   %ufp1, DS:[rdx]
> > Memory Usage: 46436124 KBytes
> > Program aborted at tick 10680071080500
> >
>
>
> sometimes, error are:
>
> > panic: Tried to write unmapped address 0x2b95d881.
>
> or
>
> > panic: Tried to write unmapped address 0x3.
>
>
> According to my log, I found the problem happended on 
> "dnnmarkActivationBackward" func.
>
> > LOG(INFO) << "enter dnnmarkActivationBackward func";
> > #ifdef AMD_MIOPEN
> >   MIOPEN_CALL(miopenActivationBackward(
> >   mode == COMPOSED ?
> >   handle.GetMIOpen(idx) : handle.GetMIOpen(),
> >   activation_desc.Get(),
> >   alpha,
> >   top_desc.Get(), y,
> >   top_desc.Get(), dy,
> >   bottom_desc.Get(), x,
> >   beta,
> >   bottom_desc.Get(), dx));
> > #endif
> >   LOG(INFO) << "exit dnnmarkActivationBackward func";
>
>
> It seems to be a miopen interface functions. I don't know how to solve it. 
> Someone could help me?
>
>
> PS:
>
> my gem5 version is v21-2, and docker image is v21-2.
>
> my run command is: build/GCN3_X86/gem5.opt --outdir=$outdir 
> configs/example/apu_se.py -n 10 --mem-size=8GB 
> --benchmark-root=$BenchmarkRoot/test_bwd_activation -c 
> dnnmark_test_bwd_activation --options="-config 
> "$ConfigRoot"/activation_config.dnnmark -mmap "$MMAPFile" -debuginfo 1"
>
> Both computers have no AMD GPU.
>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: FW: Gem5GCN3

2021-12-30 Thread Matt Sinclair via gem5-users
(Resending since message to mailing list bounced)


Hi Atiye,


When you have questions about gem5, please email the mailing list, instead
of emailing anyone (e.g., me) directly.  I am not always available to
reply, nor do I know everything about gem5 – emailing the mailing list
makes it more likely you’ll get a response faster.  Additionally, if you
are getting an error, it is very helpful if you include that error message
– from looking at your command, I don’t know what the problem is.



Having said all that, to the best of my knowledge no one has tried running
Chai in the gem5 GPU models.  So, I can’t directly provide you with what
you are asking.  However, if you get Chai working, we would greatly
appreciate you contributing them to gem5-resources with a pull request!
See here for the list of CPU and GPU benchmarks that are currently
supported: resources.gem5.org/.



Moreover, although I don’t know exactly what your error is, from looking at
your command line, it appears you are trying to run the OpenCL version of
Chai?  If so, I don’t believe OpenCL has been supported in gem5 for at
least 5 years (Alex/Brad, CC’d, may know better).  Specifically, I believe
the problem is that OpenCL’s online GPU kernel compilation is not supported
in gem5 currently.  So, what I’d recommend is that you hipify the CUDA
version of Chai, and try to run those instead.  HIPIFY’ing those versions
should be very easy (< 10 minutes per benchmark after you’ve done it once –
it is done using $ROCM_HOME/bin/hipify-perl as long as your program doesn’t
have any library calls like cuDNN or cuBLAS), and will create a HIP version
of the benchmarks – HIP is the GPU programming language that is currently
supported in gem5.  I can’t promise that things will work even after doing
this, but it’s much more likely you’ll have a version that will be closer
to running in gem5 if you go this route.

Good luck,
Matt

On Thu, Dec 30, 2021 at 9:56 PM Matt Sinclair  wrote:

>
>
>
>
> *From:* atiye.ghe...@gmail.com 
> *Sent:* Thursday, December 30, 2021 3:53 PM
> *To:* Matt Sinclair 
> *Subject:* Gem5GCN3
>
>
>
> Hello
>
> I hope you are well
>
> I am a master student in Sharif University.
>
> I work with Prof. Sarbazi-Azad and Prof. Hessabi.
>
> I work on heterogeneous systems so I shoud run benchmarks in some
> simulators like gem5gcn3.
>
> Would you please tell me the command that I shoul put in terminal for
> running chai benchmark, I ran this command but there is problem with it and
> I dont know what the problem is.
>
> sudo docker run -v /home/atiyeh/Desktop/chai:/benchmark -v
> /home/atiyeh/Desktop/gem5:/gem5 -it gem5gcn3 /bin/bash
> /gem5/build/GCN3_X86/gem5.opt /gem5/configs/example/apu_se.py
> --benchmark-root=/benchmark -c OpenCL-U/CEDT/cedt
>
>
>
>
>
> Thanks
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Unrecognized register class when using the "Exec" debug flag

2021-12-01 Thread Matt Sinclair via gem5-users
Thanks Gabe.  Good catch about the actual value -- I just saw a negative
number and assumed -1, whoops.  Based on what Nirmit is seeing, it seems
like HINT_NOP or MOV_R_I must be the instruction causing the fault, but
yeah a backtrace will probably help confirm.

Nirmit, can you please try running stable with a debug build (to get a
backtrace) and develop with a release build and let us know what you see?

Matt

On Wed, Dec 1, 2021 at 10:47 PM Gabe Black  wrote:

> I realize this is probably a hard question to answer with Exec being
> broken, but do you know what instruction is causing the problem? HINT_NOP?
> Probably the first thing that someone should do (if they haven't already)
> is to run this under gdb and see what the backtrace looks like, since that
> would give us a lot more info to work with.
>
> Looking at the info we have here, I see that the return from classValue()
> is -854770912 (not -1?) which to me looks like junk. I think probably
> what's happening is that the RegId being passed to the instruction's
> printReg function is from a bad pointer of some sort which is why it
> doesn't know how to print the register name. The RegId in this case refers
> to a particular register/operand, not the instruction as a whole. For
> instance, when the previous instruction prints out eax, that would be a
> RegId with classValue() (member regClass) set to IntRegClass, and regIdx
> set to INTREG_RAX.
>
> This works a little differently now and is in the process of being
> significantly reworked, although the gist is largely the same, particularly
> in the details involved here. The RegId structure tells you what type of
> register you're dealing with, aka its class, and also which particular
> register within that space you're referring to. The printReg method is
> trying to figure out what the name of that register is so it can be printed
> as part of the disassembly.
>
> I think the real bug is going to be that the RegId itself is bogus, and so
> when it's operated on, it's random junk will lead to random behavior or
> errors. It could be, for instance, that the instruction is trying to print
> a register name in its disassembly, but it doesn't actually *have* a
> register value set up in that slot and so is using uninitialized values.
> Typically the instructions would try to print out, say, destination
> register 0 when forming the disassembly string. Alternatively, O3 could
> have done something whacky and could be trying to do something with a
> nonsense instruction. I would personally lean towards the first option, but
> without more info it's hard to tell.
>
> I would also suggest trying this with develop. I don't think that's a
> *solution* to the problem, but it would possibly help isolate a cause. Like
> I said, how things work in develop are a little bit different, so we might
> get more info by also seeing what happens in those slightly different
> circumstances.
>
> Gabe
>
> On Wed, Dec 1, 2021 at 8:30 PM Matt Sinclair 
> wrote:
>
>> Hi Gabe,
>>
>> I was trying to dig through the RegClass code earlier to figure out why
>> the value is -1 for this instruction, and the only thing that I can think
>> of is HINT_NOP needs a RegClass value set for it, but it isn't set for some
>> reason (which is not 100% clear to me).  You know this code much better
>> than I do though, hence I was hoping you might see something I'm not seeing.
>>
>> Since this error is happening on a clean checkout of gem5 on stable, it
>> seems like a bug that anyone could face if they use the Exec debug flag.
>>
>> Thanks,
>> Matt
>>
>> -- Forwarded message -
>> From: Nirmit Jallawar via gem5-users 
>> Date: Wed, Dec 1, 2021 at 10:25 PM
>> Subject: [gem5-users] Unrecognized register class when using the "Exec"
>> debug flag
>> To: gem5-users@gem5.org 
>> Cc: Nirmit Jallawar 
>>
>>
>> Hi all,
>>
>>
>>
>> I was trying to run a gem5 simulation using the O3CPU but encountered
>> problems with gem5 “panic” when running with the “Exec” debug flags
>> enabled. I have built gem5 for the x86 ISA, and am using the stable branch.
>>
>> The full log can be found in the zip linked below (crash_debug_log).
>>
>> The error in the log seems to be related to this:
>>
>> build/X86/arch/x86/insts/static_inst.cc:253: panic: Unrecognized register
>> class.
>>
>>
>>
>> On further debugging, it seems that the register class value is being set
>> to -1:
>>
>> ….
>>
>> 7335000: system.cpu: T0 : 0x7801bbdd @_end+140737354234813. 2 :
>> CALL_NEAR_I : stis   t7, SS:[rsp + 0xfff8] : MemWrite :
>>  D=0x7801bbe2 A=0x7fffed48
>>
>> 7335000: system.cpu: T0 : 0x7801bbdd @_end+140737354234813. 3 :
>> CALL_NEAR_I : subi   rsp, rsp, 0x8 : IntAlu :  D=0x7fffed48
>>
>> 7335000: system.cpu: T0 : 0x7801bbdd @_end+140737354234813. 4 :
>> CALL_NEAR_I : wrip   t7, t1 : IntAlu :
>>
>> 7447000: system.cpu: T0 : 0x7801d080 @_end+140737354240096: hint
>>
>> 7447000: system.cpu: T0 : 0x7801d080 

[gem5-users] Re: Unrecognized register class when using the "Exec" debug flag

2021-12-01 Thread Matt Sinclair via gem5-users
Hi Gabe,

I was trying to dig through the RegClass code earlier to figure out why the
value is -1 for this instruction, and the only thing that I can think of is
HINT_NOP needs a RegClass value set for it, but it isn't set for some
reason (which is not 100% clear to me).  You know this code much better
than I do though, hence I was hoping you might see something I'm not seeing.

Since this error is happening on a clean checkout of gem5 on stable, it
seems like a bug that anyone could face if they use the Exec debug flag.

Thanks,
Matt

-- Forwarded message -
From: Nirmit Jallawar via gem5-users 
Date: Wed, Dec 1, 2021 at 10:25 PM
Subject: [gem5-users] Unrecognized register class when using the "Exec"
debug flag
To: gem5-users@gem5.org 
Cc: Nirmit Jallawar 


Hi all,



I was trying to run a gem5 simulation using the O3CPU but encountered
problems with gem5 “panic” when running with the “Exec” debug flags
enabled. I have built gem5 for the x86 ISA, and am using the stable branch.

The full log can be found in the zip linked below (crash_debug_log).

The error in the log seems to be related to this:

build/X86/arch/x86/insts/static_inst.cc:253: panic: Unrecognized register
class.



On further debugging, it seems that the register class value is being set
to -1:

….

7335000: system.cpu: T0 : 0x7801bbdd @_end+140737354234813. 2 :
CALL_NEAR_I : stis   t7, SS:[rsp + 0xfff8] : MemWrite :
 D=0x7801bbe2 A=0x7fffed48

7335000: system.cpu: T0 : 0x7801bbdd @_end+140737354234813. 3 :
CALL_NEAR_I : subi   rsp, rsp, 0x8 : IntAlu :  D=0x7fffed48

7335000: system.cpu: T0 : 0x7801bbdd @_end+140737354234813. 4 :
CALL_NEAR_I : wrip   t7, t1 : IntAlu :

7447000: system.cpu: T0 : 0x7801d080 @_end+140737354240096: hint

7447000: system.cpu: T0 : 0x7801d080 @_end+140737354240096. 0 :
HINT_NOP : fault   NoFault : No_OpClass :

7447000: system.cpu: T0 : 0x7801d084 @_end+140737354240100: mov
eax, 0xc

7447000: system.cpu: T0 : 0x7801d084 @_end+140737354240100. 0 :
MOV_R_I : limm   eax, 0xc : IntAlu :  D=0x000c

build/X86/arch/x86/insts/static_inst.cc:254: panic: Unknown register class:
-854770912 (reg.classValue())

Memory Usage: 632228 KBytes

Program aborted at tick 7455000

--- BEGIN LIBC BACKTRACE ---

….

The error does not appear when using no debug flags or using flags like
'IEW'.

The command used to run the simulation is:

../build/X86/gem5.opt --debug-flags=Exec DAXPY-newCPU.py daxpy --cpu O3CPU

If needed, you can find the related files here:
https://drive.google.com/file/d/1Sxg-c9Gy0NU2r3_nd88A_le18C5RkuR_/view?usp=sharing

I would appreciate any help on this.



Best,

Nirmit






___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Duplicate MessageBuffer creation in GPU_VIPER.py

2021-11-03 Thread Matt Sinclair via gem5-users
This certainly seems like a bug, but my guess is it's benign and will
essentially overwrite the existing one.  Brad/Matt P (CC'd) may know better
though.

Matt

On Wed, Nov 3, 2021 at 4:06 AM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hi All,
>
> The dir_cntrl.requestToMemory is initialized twice in GPU_VIPER.py.
> Could this potentially lead to two MessageBuffers being added or the
> previous one will just be overwritten ? Is this correct functionality?
>
>
> https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/configs/ruby/GPU_VIPER.py#448
>
>
> https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/configs/ruby/GPU_VIPER.py#457
>
> Thanks and regards,
> Sampad Mohapatra
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: MOESI_AMD_Base-CorePair.sm and MOESI_AMD_Base-dir.sm Correctness Check

2021-10-23 Thread Matt Sinclair via gem5-users
Yes, I understood this is what you meant.  The point I was trying to make
is I have not examined a trace to see what is actually happening (have you
gotten a trace to examine what's happening with the DataBlk value for this
request?).  After digging in a little further, it appears that this line:
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm#878
will be allocating a directory entry if it doesn't exist already
(build/.../mem/ruby/protocol/Directory_Controller.cc:2022).  Since the
state is already U in the transition you are highlighting, I'm assuming the
DirectoryEntry already exists.  So the allocation would not need to happen,
and thus they'd be taking whatever data entry is already there (it is
unclear to me what happens if the directory entry is being allocated ...
because then there would be no data block to start with).  It certainly
appears that vd_victim is not treating the request it creates as a
writethrough, which is the only case in the directory right now where it
updates the TBE entry from the incoming message.

This kind of stuff is tricky though, and there may be other assumptions
elsewhere I haven't seen that explain it.  So I'd want someone to look at a
trace and see if the DataBlk is being updated despite my initial read being
that something isn't quite right.  Hence, getting a trace -- and likely
adding some DPRINTFs for the ProtocolTrace flag that print out DataBlk at
the beginning and end of this action would be important.

Brad, do you remember if the data values are truly being copied by Ruby
now?  I believe the functional accesses are no longer separate?  If not,
then it perhaps explains why this potential bug hasn't been caught before
(since it's a benign bug).  If so, then it does seem like this would affect
correctness.

My guess is it would be a copy partial if this change is needed -- in case
the prior level of cache only wrote specific words on the line, you'd only
want to update those words.  I could imagine multiple protocols connecting
to the directory, and thus while any one of them might update all of the
words on the line, to ensure correctness for both protocols that write
specific words and ones that write the entire line.

Matt

On Sat, Oct 23, 2021 at 1:53 PM Sampad Mohapatra  wrote:

> Hi Matt,
>
> The following condition is missing in t_allocateTBE, but the corepair
> sends a message with VicDirty - CoherenceRequestType.
>
> if (in_msg.Type == CoherenceRequestType:VicDirty) {
>   tbe.DataBlk = in_msg.DataBlk;
> }
>
> P.S.: I am not sure whether the complete block should be replaced or just
> partially copied.
>
> Thanks,
> Sampad
>
> On Sat, Oct 23, 2021 at 2:44 PM Matt Sinclair <
> mattdsinclair.w...@gmail.com> wrote:
>
>> (Resending to mailing list)
>>
>> Hi Sampad,
>>
>> There are lines directly below the one I pointed to that do potentially
>> overwrite the data there.  But I am not 100% sure -- Brad and Matt P, CC'd
>> may know better or see something I'm missing.
>>
>> Matt
>>
>> On Sat, Oct 23, 2021 at 1:37 PM Sampad Mohapatra  wrote:
>>
>>> Yes, but the data is coming from the directory and not the incoming
>>> message, which has the actual data.
>>>
>>> Should it not be:
>>> *tbe.DataBlk := in_msg.DataBlk;*
>>>
>>> i.e., store the dirty victim block data in the tbe.
>>>
>>> Thanks,
>>> Sampad
>>>
>>> On Sat, Oct 23, 2021 at 1:00 PM Matt Sinclair <
>>> mattdsinclair.w...@gmail.com> wrote:
>>>
 I am not sure I understand completely what you're getting at, but it
 appears the allocation of the TBE entry does store the data:
 https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm#878

 Matt

 On Thu, Oct 21, 2021 at 11:08 PM Sampad Mohapatra via gem5-users <
 gem5-users@gem5.org> wrote:

> Hello All,
>
> I was looking at the MOESI_AMD_Base-CorePair.sm and
> MOESI_AMD_Base-dir.sm and am not quite sure if the following sequence of
> events are correct or not. Can you please verify?
>
> /
> At CorePair -> invokes action "vd_victim", which sends a data block
> with outgoing message.
>
> At Directory -> undergoes "transition(U, VicDirty, BL)" on message
> reception, but doesn't store the received data block in the generated TBE
> and the message is popped out/discarded.
> /
>
> Is the above expected behaviour ?
>
> Thanks and regards,
> Sampad Mohapatra
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: MOESI_AMD_Base-CorePair.sm and MOESI_AMD_Base-dir.sm Correctness Check

2021-10-23 Thread Matt Sinclair via gem5-users
 (Resending to mailing list)

Hi Sampad,

There are lines directly below the one I pointed to that do potentially
overwrite the data there.  But I am not 100% sure -- Brad and Matt P, CC'd
may know better or see something I'm missing.

Matt

On Sat, Oct 23, 2021 at 1:37 PM Sampad Mohapatra  wrote:

> Yes, but the data is coming from the directory and not the incoming
> message, which has the actual data.
>
> Should it not be:
> *tbe.DataBlk := in_msg.DataBlk;*
>
> i.e., store the dirty victim block data in the tbe.
>
> Thanks,
> Sampad
>
> On Sat, Oct 23, 2021 at 1:00 PM Matt Sinclair <
> mattdsinclair.w...@gmail.com> wrote:
>
>> I am not sure I understand completely what you're getting at, but it
>> appears the allocation of the TBE entry does store the data:
>> https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm#878
>>
>> Matt
>>
>> On Thu, Oct 21, 2021 at 11:08 PM Sampad Mohapatra via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hello All,
>>>
>>> I was looking at the MOESI_AMD_Base-CorePair.sm and
>>> MOESI_AMD_Base-dir.sm and am not quite sure if the following sequence of
>>> events are correct or not. Can you please verify?
>>>
>>> /
>>> At CorePair -> invokes action "vd_victim", which sends a data block with
>>> outgoing message.
>>>
>>> At Directory -> undergoes "transition(U, VicDirty, BL)" on message
>>> reception, but doesn't store the received data block in the generated TBE
>>> and the message is popped out/discarded.
>>> /
>>>
>>> Is the above expected behaviour ?
>>>
>>> Thanks and regards,
>>> Sampad Mohapatra
>>> ___
>>> gem5-users mailing list -- gem5-users@gem5.org
>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: MOESI_AMD_Base-CorePair.sm and MOESI_AMD_Base-dir.sm Correctness Check

2021-10-23 Thread Matt Sinclair via gem5-users
I am not sure I understand completely what you're getting at, but it
appears the allocation of the TBE entry does store the data:
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm#878

Matt

On Thu, Oct 21, 2021 at 11:08 PM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hello All,
>
> I was looking at the MOESI_AMD_Base-CorePair.sm and MOESI_AMD_Base-dir.sm
> and am not quite sure if the following sequence of events are correct or
> not. Can you please verify?
>
> /
> At CorePair -> invokes action "vd_victim", which sends a data block with
> outgoing message.
>
> At Directory -> undergoes "transition(U, VicDirty, BL)" on message
> reception, but doesn't store the received data block in the generated TBE
> and the message is popped out/discarded.
> /
>
> Is the above expected behaviour ?
>
> Thanks and regards,
> Sampad Mohapatra
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Access to gem5 101 course

2021-10-14 Thread Matt Sinclair via gem5-users
Hi all,

I believe Jason messaged some of you individually, but we are in the
process of hosting the gem5 101 "assignments" on the gem5.org website now.
Hopefully more news on this soon.

In the meantime, you are welcome to look at my course website, but keep in
mind the 2020 versions were updated to work with Ubuntu 16, whereas the
2021 website was updated to work with Ubuntu 20.  Note that this only
covers the first two assignments in the original gem5 101 though.

Thanks,
Matt

On Thu, Oct 14, 2021 at 6:41 AM Javed Osmany via gem5-users <
gem5-users@gem5.org> wrote:

> Hello
>
> No progress on accessing that link for me.
>
> However, the following two urls might give some insight.
>
> http://pages.cs.wisc.edu/~david/courses/cs752/Fall2015/
>
>
> https://pages.cs.wisc.edu/~sinclair/courses/cs752/fall2020/includes/schedule.html
>
> BR
> JO
>
> -Original Message-
> From: Ioannis Mavroidis via gem5-users [mailto:gem5-users@gem5.org]
> Sent: 14 October 2021 10:40
> To: gem5-users@gem5.org
> Cc: imavr...@exapsys.eu
> Subject: [gem5-users] Re: Access to gem5 101 course
>
> Any news with this? It would be of great help if we could access the
> course!
>
> Thanks,
> Yannis
> ___
> gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an
> email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3 - Polybench GPU - SPEC 17 - Errors

2021-10-09 Thread Matt Sinclair via gem5-users
If you cannot use docker, then I recommend using the commands Kyle had in
the old dockers when installing ROCm.  Manually building like you are is
extremely error prone.  I don't know exactly what the problem(s) is, but
I'm pretty sure it's HCC_AMDGPU_TARGET, not HSA_AMDGPU_GPU_TARGET.  I'm
pretty sure cmake just ignored that when you specified it, since it's not a
real variable?  Kyle, do you have a pointer to what commit updated the
Dockerfile to 4.0 (Maybe this:
https://github.com/KyleRoarty/gem5_docker/blob/ml/Dockerfile?  Or is this
out of date relative to what was used before the ROCm 4.0 update?)?

You can certainly try cherry-picking those commits to your older branch.
VIPER is not frequently updated, so it's reasonably likely you'll be able
to apply the patches cleanly.

Matt

On Sat, Oct 9, 2021 at 12:12 PM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hi Matt,
>
> Thanks for the quick reply.
>
> I am running the benchmarks on research clusters where running docker is
> not permitted and hence I have to build everything and install locally.
> I have made modifications to the coherence protocol and porting it to a
> newer Gem5 version may take some time and hence I am stuck with v21.0.0 for
> now.
> Although the modifications are basically flags to identify certain packet
> types, so I am assuming that I haven't broken the protocol.
> Also, I have run the *square *benchmark and *2DConvolution, FDTD-2D *to
> completion (compared with cpu execution result) for smaller input sizes.
> If this version of GEM5 supports anything higher than rocm 1.6.x, I will
> try to build and use it.
>
> To build hcc, I have used the following command. I looked at the
> CMakelist.txt of other dependencies, but, they don't seem to be using
> HSA_AMDGPU_GPU_TARGET  variable:
> cmake -DCMAKE_INSTALL_PREFIX=rocm/hcc -DROCM_ROOT=rocm
> -DHSA_AMDGPU_GPU_TARGET="gfx801" -DCMAKE_BUILD_TYPE=Release ..
>
> And I build polybench using:
> hipcc --amdgpu-target=gfx801 -O2 2DConvolution.cpp -Igem5/include
> -Lgem5/util/m5/build/x86/out -Lgcc/lib64 -o 2DConvolution.exe -lm5
>
> I do remember that while compiling HCC, *bin/cmake-tests* build was
> failing because it was using the generated *clang++* which was unable to
> find *libstdc++.so.*
> LIBRARY_PATH is ignored (compile time) by the generated clang++ maybe.
> So, I modified the generated CMake file to add a " -Lgcc/lib64" to it so
> that it completes *make* and *make install*. The downside is I have to
> explicitly place *" -Lgcc/lib64 *"
> while compiling benchmarks using hipcc. Also, *square  *completes, so I
> think LD_LIBRARY_PATH works(runtime).
>
> I did see the commits you recently merged, but I wasn't sure whether I can
> retroactively add them to v21.0.0 which also has my own modifications.
> Should I go ahead and make the VIPER_TCC changes ?
>
> Also, I will definitely try to submit the benchmarks if they work out.
>
> Regards,
> Sampad
>
> On Sat, Oct 9, 2021 at 12:34 PM Matt Sinclair via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> Hi Sampad,
>>
>> I have not seen anyone attempt to run workloads in a way you are
>> attempting, so I can't offer every solution, but here are a few things I
>> noticed:
>>
>> - Why are you still using ROCm 1.6.x?  And why did you build it from
>> source?  I strongly recommend using the built-in docker support (which
>> supports ROCm 4.0 now).  The error #4 you are having is almost definitely
>> because something you built from source is not built correctly. But the
>> possible causes of this error are disparate, so I can't suggest anything
>> specific about how to fix it.  Basically, that error means something went
>> wrong when running the application, which almost always (in my experience)
>> is due to not installing ROCm correctly.  If you need to continue on with
>> ROCm 1.6.x, I would recommend looking at the old commits before ROCm 4.0
>> support was added, and use the docker support there.
>>
>> - Error #3 likely comes from how you are compiling the program with
>> hipcc/hcc.  Depending on which commit you are using, you need to only use
>> gfx801, gfx803, gfx900, or gfx902.  Since you seem to be using a slightly
>> older setup, probably the issue is you are compiling for something other
>> than gfx801 (also if you are compiling for gfx803 or gfx900, did you use
>> the -dgpu flag on the command line?).  It is likely error #1 is related to
>> this too.
>>
>> - Error #2 will require getting a Ruby trace and looking at what's
>> happening with those addresses (ProtocolTrace debug flag is the most
>> important flag to use).  You may find the following useful:
>

[gem5-users] Re: GCN3 - Polybench GPU - SPEC 17 - Errors

2021-10-09 Thread Matt Sinclair via gem5-users
Hi Sampad,

I have not seen anyone attempt to run workloads in a way you are
attempting, so I can't offer every solution, but here are a few things I
noticed:

- Why are you still using ROCm 1.6.x?  And why did you build it from
source?  I strongly recommend using the built-in docker support (which
supports ROCm 4.0 now).  The error #4 you are having is almost definitely
because something you built from source is not built correctly. But the
possible causes of this error are disparate, so I can't suggest anything
specific about how to fix it.  Basically, that error means something went
wrong when running the application, which almost always (in my experience)
is due to not installing ROCm correctly.  If you need to continue on with
ROCm 1.6.x, I would recommend looking at the old commits before ROCm 4.0
support was added, and use the docker support there.

- Error #3 likely comes from how you are compiling the program with
hipcc/hcc.  Depending on which commit you are using, you need to only use
gfx801, gfx803, gfx900, or gfx902.  Since you seem to be using a slightly
older setup, probably the issue is you are compiling for something other
than gfx801 (also if you are compiling for gfx803 or gfx900, did you use
the -dgpu flag on the command line?).  It is likely error #1 is related to
this too.

- Error #2 will require getting a Ruby trace and looking at what's
happening with those addresses (ProtocolTrace debug flag is the most
important flag to use).  You may find the following useful:
https://www.gem5.org/documentation/learning_gem5/part3/MSIdebugging/.
Having said that, note that I recently merged two fixes to the VIPER TCC
that may be relevant/useful:
https://gem5-review.googlesource.com/c/public/gem5/+/51368,
https://gem5-review.googlesource.com/c/public/gem5/+/51367

Finally, Polybench is not officially supported.  If you get them working,
it would be great if you submit them to gem5-resources (resources.gem5.org/)
to allow others to also use them!

Thanks,
Matt

On Sat, Oct 9, 2021 at 9:47 AM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hi All,
>
> I am running gem5 v21.0.0.0, rocm v1.6.x (built from source). The
> simulations run one host CPU (its pair runs a tiny binary and ends exec
> quickly) to launch GPU benchmark (hipified Polybench GPU) and one CPU of a
> separate core-pair(its 2nd core runs a lightweight binary and ends exec
> quickly) to launch a SPEC-17 CPU benchmark on a 3x3 Mesh network. And I am
> facing 4 different kinds of errors and am requesting some help regarding
> them. The GPU benchmarks do "malloc"s of  size ranging from 2GB - 10GB. The
> errors appear on various combination of CPU and GPU benchmarks.
>
> (1) The below error appears and disappears on different simulation runs
> "
> fdtd2d: ../ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:577: virtual
> void amd::GpuAgent::InitDma(): Assertion `queues_[QueueBlitOnly] != __null
> && "Queue creation failed"' failed.
> "
>
> (2) Similar errors with varying values
> "
> panic: Possible Deadlock detected. Aborting!
> version: 4 request.paddr: 0x190b80c uncoalescedTable: 4 current time:
> 12393604096000 issue_time: 12393350811000 difference: 253285000
> Request Tables:
>
> Listing pending packets from 4 instructions Addr: [0x2379b, line
> 0x23780] with 0 pending packets
> Addr: [0x237ae, line 0x23780] with 64 pending packets
> Addr: [0x237b0, line 0x23780] with 56 pending packets
> Addr: [0x237b5, line 0x23780] with 61 pending packets
> Memory Usage: 57420616 KBytes
> "
>
> (3) The below error appears and disappears on different simulation runs:
> "
> There is no device can be used to do the computation
> "
>
> (4) The below error appears and disappears on different simulation runs:
> "
> fatal: syscall mincore (#27) unimplemented.
> "
>
> Thanks and Regards,
> Sampad Mohapatra
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 GCN GPU docker error

2021-09-23 Thread Matt Sinclair via gem5-users
Patch to further update GCN3 webpage posted:
https://gem5-review.googlesource.com/c/public/gem5-website/+/50907

Matt

On Wed, Sep 22, 2021 at 2:10 PM Matt Sinclair 
wrote:

> Thanks Kyle!  I agree we should probably just update the documentation
> Imad found to point to the gem5-resources documentation -- since that was
> what we updated already.  This is part of my plan for later -- that way the
> documentation doesn't go away, but we also don't need to update two
> different sets of documentation.
>
> Matt
>
> On Wed, Sep 22, 2021 at 2:08 PM Kyle Roarty  wrote:
>
>> The issue with the rocblas build is that we don't install rocm-cmake
>> until after we install rocblas. It looks like rocblas downloads it
>> automatically if we don't have it installed, so that's why I didn't get any
>> issues when initially testing it. This patch
>>  should fix
>> the issue.
>>
>> Also, I think the documentation Imad was using was the documentation we
>> have in util/dockerfiles/gcn-gpu. I'm of the opinion that we should just
>> remove that README because we have better documentation on gem5.org and
>> in gem5-resources (Although we still say to build gfx8-apu in the
>> gem5.org documentation)
>>
>> Kyle
>> --
>> *From:* mattdsinclair.w...@gmail.com 
>> *Sent:* Wednesday, September 22, 2021 1:11 PM
>> *To:* gem5 users mailing list 
>> *Cc:* Poremba, Matthew ; Kyle Roarty <
>> kroa...@wisc.edu>; Imad Al Assir ; Bobby Bruce <
>> bbr...@ucdavis.edu>
>> *Subject:* Re: [gem5-users] Re: gem5 GCN GPU docker error
>>
>> Collating responses to emails since you all type faster than me
>>
>> - Imad: glad to hear things work with the updates Matt P proposed!
>> - documentation: Matt P, yes we did update the documentation here:
>> https://resources.gem5.org/ (e.g.,
>> https://resources.gem5.org/resources/square), but apparently didn't
>> propagate those updates to the webpage Imad was using.  I will add that to
>> my list for the week.  Bobby, I see you did part of this already.  I
>> believe there is more that needs to be cleaned up based on what Imad/Matt P
>> said, but I will wait until your version is checked in (imminently) before
>> re-reading and updating.
>> - apt repos: Matt P, you must be right about rocblas updating something.  *
>> Kyle, can you please take care of updating the docker to use the specific
>> rocblas version we need?*
>>
>> Matt
>>
>> On Wed, Sep 22, 2021 at 1:03 PM Bobby Bruce via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>> Just jumping in here,
>>
>> I can confirm I can't build the image anymore. I had assumed this was
>> just a problem on my end before reading these emails. However, the image
>> hosted at http://gcr.io/gem5-test/gcn-gpu should be the most up-to-date
>> version of this Docker prior to this build error being introduced. It
>> should work.
>>
>> I've updated the website script here:
>> https://gem5-review.googlesource.com/c/public/gem5-website/+/50807.
>> Apologies, our documentation could definitely do with some tidying up :).
>>
>> --
>> Dr. Bobby R. Bruce
>> Room 3050,
>> Kemper Hall, UC Davis
>> Davis,
>> CA, 95616
>>
>> web: https://www.bobbybruce.net
>>
>>
>> On Wed, Sep 22, 2021 at 10:02 AM Imad Al Assir via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>> Dear Matt,
>>
>> Many thanks for catching this error! It did indeed solve the problem; I
>> was able to successfully run square and other applications from hip-samples
>> on both, the manually built dockerfile with everything related to rocBLAS
>> and MIOpen commented, and the pre-built docker image which I believe has
>> rocBLAS and MIOpen installed (based on its size).
>>
>> Many thanks again,
>> Imad
>>
>> On Sep 22 2021, at 6:48 pm, Poremba, Matthew 
>> wrote:
>>
>>
>> [AMD Official Use Only]
>>
>>
>>
>> Hi Imad,
>>
>>
>>
>>
>>
>> Yes, the docker seems to have broken in the past few days.
>>
>>
>>
>> Regarding the benchmark not completing, please change your command to use
>> 3 CPUs:
>>
>>
>>
>>
>>
>> docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources
>> \
>>
>> -w /gem5 gcr.io/gem5-test/gcn-gpu \
>>
>> build/GCN3_X86/gem5.opt configs/example/apu_se.py -n3 \
>>
>> --benchmark-root=/gem5-resources/src/gpu/square/bin \
>>
>> -c square
>>
>>
>>
>> ROCm 4.0 requires 3 CPUs to run now.  I thought we had updated the
>> README.md and website before gem5 21.1 release to reflect this but looks
>> like they are not up to date.
>>
>>
>>
>>
>>
>> -Matt
>>
>>
>>
>> *From:* Imad Al Assir via gem5-users 
>> *Sent:* Wednesday, September 22, 2021 9:31 AM
>> *To:* Matt Sinclair 
>> *Cc:* gem5 users mailing list ; Kyle Roarty <
>> kroa...@wisc.edu>; Imad Al Assir 
>> *Subject:* [gem5-users] Re: gem5 GCN GPU docker error
>>
>>
>> [CAUTION: External Email]
>>
>> Hello,
>> Thank you for your reply. I was simply following the documentation on the
>> gem5 website:
>> 

[gem5-users] Re: gem5 GCN GPU docker error

2021-09-22 Thread Matt Sinclair via gem5-users
Thanks Kyle!  I agree we should probably just update the documentation Imad
found to point to the gem5-resources documentation -- since that was what
we updated already.  This is part of my plan for later -- that way the
documentation doesn't go away, but we also don't need to update two
different sets of documentation.

Matt

On Wed, Sep 22, 2021 at 2:08 PM Kyle Roarty  wrote:

> The issue with the rocblas build is that we don't install rocm-cmake until
> after we install rocblas. It looks like rocblas downloads it automatically
> if we don't have it installed, so that's why I didn't get any issues when
> initially testing it. This patch
>  should fix
> the issue.
>
> Also, I think the documentation Imad was using was the documentation we
> have in util/dockerfiles/gcn-gpu. I'm of the opinion that we should just
> remove that README because we have better documentation on gem5.org and
> in gem5-resources (Although we still say to build gfx8-apu in the gem5.org
> documentation)
>
> Kyle
> --
> *From:* mattdsinclair.w...@gmail.com 
> *Sent:* Wednesday, September 22, 2021 1:11 PM
> *To:* gem5 users mailing list 
> *Cc:* Poremba, Matthew ; Kyle Roarty <
> kroa...@wisc.edu>; Imad Al Assir ; Bobby Bruce <
> bbr...@ucdavis.edu>
> *Subject:* Re: [gem5-users] Re: gem5 GCN GPU docker error
>
> Collating responses to emails since you all type faster than me
>
> - Imad: glad to hear things work with the updates Matt P proposed!
> - documentation: Matt P, yes we did update the documentation here:
> https://resources.gem5.org/ (e.g.,
> https://resources.gem5.org/resources/square), but apparently didn't
> propagate those updates to the webpage Imad was using.  I will add that to
> my list for the week.  Bobby, I see you did part of this already.  I
> believe there is more that needs to be cleaned up based on what Imad/Matt P
> said, but I will wait until your version is checked in (imminently) before
> re-reading and updating.
> - apt repos: Matt P, you must be right about rocblas updating something.  *
> Kyle, can you please take care of updating the docker to use the specific
> rocblas version we need?*
>
> Matt
>
> On Wed, Sep 22, 2021 at 1:03 PM Bobby Bruce via gem5-users <
> gem5-users@gem5.org> wrote:
>
> Just jumping in here,
>
> I can confirm I can't build the image anymore. I had assumed this was just
> a problem on my end before reading these emails. However, the image hosted
> at http://gcr.io/gem5-test/gcn-gpu should be the most up-to-date version
> of this Docker prior to this build error being introduced. It should work.
>
> I've updated the website script here:
> https://gem5-review.googlesource.com/c/public/gem5-website/+/50807.
> Apologies, our documentation could definitely do with some tidying up :).
>
> --
> Dr. Bobby R. Bruce
> Room 3050,
> Kemper Hall, UC Davis
> Davis,
> CA, 95616
>
> web: https://www.bobbybruce.net
>
>
> On Wed, Sep 22, 2021 at 10:02 AM Imad Al Assir via gem5-users <
> gem5-users@gem5.org> wrote:
>
> Dear Matt,
>
> Many thanks for catching this error! It did indeed solve the problem; I
> was able to successfully run square and other applications from hip-samples
> on both, the manually built dockerfile with everything related to rocBLAS
> and MIOpen commented, and the pre-built docker image which I believe has
> rocBLAS and MIOpen installed (based on its size).
>
> Many thanks again,
> Imad
>
> On Sep 22 2021, at 6:48 pm, Poremba, Matthew 
> wrote:
>
>
> [AMD Official Use Only]
>
>
>
> Hi Imad,
>
>
>
>
>
> Yes, the docker seems to have broken in the past few days.
>
>
>
> Regarding the benchmark not completing, please change your command to use
> 3 CPUs:
>
>
>
>
>
> docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources \
>
> -w /gem5 gcr.io/gem5-test/gcn-gpu \
>
> build/GCN3_X86/gem5.opt configs/example/apu_se.py -n3 \
>
> --benchmark-root=/gem5-resources/src/gpu/square/bin \
>
> -c square
>
>
>
> ROCm 4.0 requires 3 CPUs to run now.  I thought we had updated the
> README.md and website before gem5 21.1 release to reflect this but looks
> like they are not up to date.
>
>
>
>
>
> -Matt
>
>
>
> *From:* Imad Al Assir via gem5-users 
> *Sent:* Wednesday, September 22, 2021 9:31 AM
> *To:* Matt Sinclair 
> *Cc:* gem5 users mailing list ; Kyle Roarty <
> kroa...@wisc.edu>; Imad Al Assir 
> *Subject:* [gem5-users] Re: gem5 GCN GPU docker error
>
>
> [CAUTION: External Email]
>
> Hello,
> Thank you for your reply. I was simply following the documentation on the
> gem5 website:
> https://www.gem5.org/documentation/general_docs/gpu_models/GCN3
> 

[gem5-users] Re: gem5 GCN GPU docker error

2021-09-22 Thread Matt Sinclair via gem5-users
 Collating responses to emails since you all type faster than me

- Imad: glad to hear things work with the updates Matt P proposed!
- documentation: Matt P, yes we did update the documentation here:
https://resources.gem5.org/ (e.g.,
https://resources.gem5.org/resources/square), but apparently didn't
propagate those updates to the webpage Imad was using.  I will add that to
my list for the week.  Bobby, I see you did part of this already.  I
believe there is more that needs to be cleaned up based on what Imad/Matt P
said, but I will wait until your version is checked in (imminently) before
re-reading and updating.
- apt repos: Matt P, you must be right about rocblas updating
something.  *Kyle,
can you please take care of updating the docker to use the specific rocblas
version we need?*

Matt

On Wed, Sep 22, 2021 at 1:03 PM Bobby Bruce via gem5-users <
gem5-users@gem5.org> wrote:

> Just jumping in here,
>
> I can confirm I can't build the image anymore. I had assumed this was just
> a problem on my end before reading these emails. However, the image hosted
> at http://gcr.io/gem5-test/gcn-gpu should be the most up-to-date version
> of this Docker prior to this build error being introduced. It should work.
>
> I've updated the website script here:
> https://gem5-review.googlesource.com/c/public/gem5-website/+/50807.
> Apologies, our documentation could definitely do with some tidying up :).
>
> --
> Dr. Bobby R. Bruce
> Room 3050,
> Kemper Hall, UC Davis
> Davis,
> CA, 95616
>
> web: https://www.bobbybruce.net
>
>
> On Wed, Sep 22, 2021 at 10:02 AM Imad Al Assir via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> Dear Matt,
>>
>> Many thanks for catching this error! It did indeed solve the problem; I
>> was able to successfully run square and other applications from hip-samples
>> on both, the manually built dockerfile with everything related to rocBLAS
>> and MIOpen commented, and the pre-built docker image which I believe has
>> rocBLAS and MIOpen installed (based on its size).
>>
>> Many thanks again,
>> Imad
>>
>> On Sep 22 2021, at 6:48 pm, Poremba, Matthew 
>> wrote:
>>
>>
>> [AMD Official Use Only]
>>
>>
>>
>> Hi Imad,
>>
>>
>>
>>
>>
>> Yes, the docker seems to have broken in the past few days.
>>
>>
>>
>> Regarding the benchmark not completing, please change your command to use
>> 3 CPUs:
>>
>>
>>
>>
>>
>> docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources
>> \
>>
>> -w /gem5 gcr.io/gem5-test/gcn-gpu \
>>
>> build/GCN3_X86/gem5.opt configs/example/apu_se.py -n3 \
>>
>> --benchmark-root=/gem5-resources/src/gpu/square/bin \
>>
>> -c square
>>
>>
>>
>> ROCm 4.0 requires 3 CPUs to run now.  I thought we had updated the
>> README.md and website before gem5 21.1 release to reflect this but looks
>> like they are not up to date.
>>
>>
>>
>>
>>
>> -Matt
>>
>>
>>
>> *From:* Imad Al Assir via gem5-users 
>> *Sent:* Wednesday, September 22, 2021 9:31 AM
>> *To:* Matt Sinclair 
>> *Cc:* gem5 users mailing list ; Kyle Roarty <
>> kroa...@wisc.edu>; Imad Al Assir 
>> *Subject:* [gem5-users] Re: gem5 GCN GPU docker error
>>
>>
>> [CAUTION: External Email]
>>
>> Hello,
>> Thank you for your reply. I was simply following the documentation on the
>> gem5 website:
>> https://www.gem5.org/documentation/general_docs/gpu_models/GCN3
>> 
>> In other words, to build the image, I used:
>>  docker build -t gcn-gpu .
>>
>>
>> This command didn't complete and was interrupted by the error I pasted in
>> the previous mail.
>>
>>
>> I was also using the command in the documentation to compile square:
>> docker run --rm -v $PWD/gem5-resources:$PWD/gem5-resources -w
>> $PWD/gem5-resources/src/gpu/square gcr.io/gem5-test/gcn-gpu make square
>>
>>
>> NOT "make gfx8-apu", as written in the documentation, which caused an
>> error: "no rule to make target 'gfx8-apu' ", and I assumed was a typo.
>>
>>
>> To run it, I also used the command in the doc:
>> docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources
>> \
>> -w /gem5 gcr.io/gem5-test/gcn-gpu \
>> build/GCN3_X86/gem5.opt configs/example/apu_se.py -n2 \
>> --benchmark-root=/gem5-resources/src/gpu/square/bin \
>> -c square
>>
>>
>> Note that in these commands, I modified the path of square to '
>> gem5-resources/src/gpu/square' instead of 'gem5-resources/src/square',
>> because that's where I found the code for it.
>> Also note that I tried downloading the pre-built binary of square (from
>> the 

[gem5-users] Re: gem5 GCN GPU docker error

2021-09-22 Thread Matt Sinclair via gem5-users
(Resending since bounced the first time)

Hi Imad,

I just built the docker earlier this week and did not have any problems
(e.g., I ran square and it completed in < 2 hours).  How are you trying to
build it?  And how are you running the applications you mentioned?

Thanks,
Matt

On Wed, Sep 22, 2021 at 12:31 AM Imad Al Assir via gem5-users <
gem5-users@gem5.org> wrote:

> Hello,
> Is there a problem with the most recent gcn-gpu docker file?
> I tried building it several times on Ubuntu 20.04 and 18.04 but it kept
> giving me this error:
>
> [...]
> Unpacking rocblas (2.32.0-cc18d25f) ...
> dpkg: dependency problems prevent configuration of rocblas:
>  rocblas depends on rocm-core; however:
>   Package rocm-core is not installed.
>
> dpkg: error processing package rocblas (--install):
>  dependency problems - leaving unconfigured
> dpkg: dependency problems prevent configuration of rocblas-dev:
>  rocblas-dev depends on rocblas (>= 2.32.0); however:
>   Package rocblas is not configured yet.
>
> dpkg: error processing package rocblas-dev (--install):
>  dependency problems - leaving unconfigured
> Errors were encountered while processing:
>  rocblas
>  rocblas-dev
> + check_exit_code 1
> + ((  1 != 0  ))
> + exit 1
> The command '/bin/sh -c ./install.sh -d -a all -i' returned a non-zero
> code: 1
>
> I also tried downloading the pre-built docker image (
> gcr.io/gem5-test/gcn-gpu) and built gem5 supposedly with no errors (but
> with a warning about deprecated namespaces not being supported by the
> compiler). Then when I tried running the 'square' sample application and
> other ones from gem5-resources/src/gpu/hip-samples (e.g. MatrixTranspose,
> dynamic_shared, inline_asm, etc.), they just kept running indefinitely (> 2
> hours), and I had to kill them to stop them.
>
> May you please try building the latest version of the gcn-gpu dockerfile
> and/or running a sample application on the pre-built docker image, and
> inform us if it works, and if not, how to fix the problem?
>
> Thanks in advance,
> Imad Al Assir
> [image: Sent from Mailspring]
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Some problems about GCN3_X86

2021-09-13 Thread Matt Sinclair via gem5-users
(Resending since bounced)

Matt

On Mon, Sep 13, 2021 at 1:22 PM Matt Sinclair  wrote:

> Rodinia is currently not part of the publicly available gem5-resources:
> http://resources.gem5.org/.  You are welcome to add support for them
> though.  It would be fairly straightforward to add them -- you would hipify
> the benchmarks, potentially update the memory management to not use copies
> (since an APU is modeled) and then try to run them.  If you do this, we
> welcome you contributing them back as a commit.
>
> SPEC info here: https://resources.gem5.org/resources/spec-2017
>
> Matt
>
> On Sun, Sep 12, 2021 at 10:52 PM Kevin KU via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> The last question, How can I run benchmarks such as Rodinia or SPEC2006
>> if I install all packages and compile success?
>> Is there any tutorial I can read or any example command I can follow when
>> I need to run benchmarks?
>>
>> Thank you for the assistance, professor Matt~
>> ___
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Some problems about GCN3_X86

2021-09-12 Thread Matt Sinclair via gem5-users
If you don’t use the docker, then you will need to install the ROCm stack,
yes.

The gcn3_x86 build includes both CPUs and a GPU as is.  So that is not a
problem.  And I believe it has private L1s and a shared L2.  It does not by
default partition the way you requested, but you are welcome to add that
support and push it as an additional feature.  You may also consider
reading the Gutierrez HPCA ‘18 paper or the gem5 GPU tutorial slides on the
gem5 website to answer these kinds of questions.

Matt

On Sun, Sep 12, 2021 at 10:15 PM Kevin KU via gem5-users <
gem5-users@gem5.org> wrote:

> Sounds good, I will try it again with stable branch
>
> Maybe I need to try the docker, if all ROCm software installed correctly,
> maybe it will be a good choice.
> One more question, if I want to simulate the Heterogeneous system(CPU+GPU)
> with private L1 and shared L2 (LLC), which protocol I need to build, and
> how can I partition the LLC(I mean if LLC is 4MB, I want to give CPU 3MB
> and GPU 1MB), or how can I make GPU request bypass the LLC.
>
> Thank you so much for your reply!!!
> Best wishes
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
-- 
Regards,
Matt Sinclair
Assistant Professor
University of Wisconsin-Madison
Computer Sciences Department
cs.wisc.edu/~sinclair
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Some problems about GCN3_X86

2021-09-12 Thread Matt Sinclair via gem5-users
The branch you mentioned is at least 2 years out of date.  I recommend you
use the current stable branch instead:
https://gem5.googlesource.com/public/gem5/+/refs/heads/stable, which has
all of the GPU support from the master-gcn3-staging branch (and many more)
integrated into it.  Moreover, to use the GPU model you need to have the
ROCm stack installed, which it doesn't seem like you do based on your reply.

In terms of docker support, you may consider looking here:
https://www.gem5.org/documentation/general_docs/gpu_models/GCN3.  The
website you listed is mostly about how to run the CPU models.  I will note
that the GCN3 documentation page does need to be updated though -- the
stable branch now has support for GCN3 and more applications that square
have been tested and released with it.  We will work on updating this
soon.  Nevertheless, the instructions for the docker are correct and you
should be able to use them to get an up-to-date version of gem5 that runs
the GPU model running (the docker also has all the ROCm software installed
correctly).

Setting that all aside, I am not sure what version of gcc is used in Ubuntu
18, but there are some recent commits that updated gem5 to require gcc >=
7.  If Ubuntu 18 uses an older gcc, I don't think it will work (using the
docker should get around this though).

Matt

On Sun, Sep 12, 2021 at 9:41 PM Kevin KU via gem5-users 
wrote:

> Hi, I trying to build the GCN3 with WSL2 Ubuntu18.04, and follow this
> website "https://www.gem5.org/documentation/general_docs/building; to
> install all the packages, and download this branch "
> https://gem5.googlesource.com/amd/gem5/+/refs/heads/agutierr/master-gcn3-staging
> ".
> (I checked the compiler version : gcc-7.5.0 ; python-2.7)
>
> I don't use docker, because I have no idea where can find the docker image
> you provide.
> Is there any different if I use docker or use WSL2 or install Ubuntu OS on
> computer?
>
> Thank you for your reply, hope we can solve this soon.
> Best wishes
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Some problems about GCN3_X86

2021-09-12 Thread Matt Sinclair via gem5-users
Hi,

Can you please supply some additional information.  For example, how are
you trying to compile the GCN3 GPU version?  And what branch/commit are you
using?  And are you using the docker we released that installs the GPU
driver stack correctly, or are you trying to build without the docker? The
screenshot you attached doesn't show enough to see more than what the error
is, but I haven't had this error happen before.  So likely something is
wrong with the setup.

Thanks,
Matt

On Sun, Sep 12, 2021 at 8:35 AM Windows 10 via gem5-users <
gem5-users@gem5.org> wrote:

> Hello, I’m new in gem5-gcn3, I trying to build the simulator with this
> command : scons biuld/GCN3_X86/gem5.opt -j 4.
>
>
> But I get this message from my  terminal.
>
> I use WSL2 Ubuntu18.04 on Windows 10 21H1
>
>
> Please help me to fix this problem, thanks a lot.
> Best regrade
>
>
>
> 從 Windows 的郵件 傳送
>
>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 GCN3 GPU model docker build issue

2021-03-11 Thread Matt Sinclair via gem5-users
Follow-up:

Commit that updates documentation here:
https://gem5-review.googlesource.com/c/public/gem5-website/+/42803

Matt

On Thu, Mar 11, 2021 at 11:01 AM Matt Sinclair 
wrote:

> Thanks for pointing this out, we will update the documentation to be more
> explicit.
>
> Matt
>
> On Wed, Mar 10, 2021 at 11:06 PM xpf via gem5-users 
> wrote:
>
>> Hi,
>>
>> I didn't see the instructions say to use stable branch. I follow the
>> instructions on
>> http://www.gem5.org/documentation/general_docs/gpu_models/GCN3 which
>> don't mention to use stable or develop branch. But now I see 'integrating
>> the Docker into the develop branch ' on
>> https://www.gem5.org/2020/05/27/modern-gpu-applications.html.
>>
>> Thank you very much.
>> ___
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 GCN3 GPU model docker build issue

2021-03-11 Thread Matt Sinclair via gem5-users
Thanks for pointing this out, we will update the documentation to be more
explicit.

Matt

On Wed, Mar 10, 2021 at 11:06 PM xpf via gem5-users 
wrote:

> Hi,
>
> I didn't see the instructions say to use stable branch. I follow the
> instructions on
> http://www.gem5.org/documentation/general_docs/gpu_models/GCN3 which
> don't mention to use stable or develop branch. But now I see 'integrating
> the Docker into the develop branch ' on
> https://www.gem5.org/2020/05/27/modern-gpu-applications.html.
>
> Thank you very much.
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 GCN3 GPU model docker build issue

2021-03-09 Thread Matt Sinclair via gem5-users
 Right, like Matt said you should be using develop, not stable, for now.

Did you see the instructions say to use stable somewhere?  If so we can
update that.

Matt

On Tue, Mar 9, 2021 at 9:42 AM Poremba, Matthew via gem5-users <
gem5-users@gem5.org> wrote:

> [AMD Public Use]
>
> Hi,
>
>
> Develop branch has the latest Dockerfile. Note that GCN3 won't be
> "officially" part of gem5 until 21.0 release (in a few weeks).
>
>
> -Matt
>
> -Original Message-
> From: xpf via gem5-users 
> Sent: Monday, March 8, 2021 11:21 PM
> To: gem5-users@gem5.org
> Cc: 1045749...@qq.com
> Subject: [gem5-users] Re: gem5 GCN3 GPU model docker build issue
>
> [CAUTION: External Email]
>
> Hi,
>
> Thanks for your reply.
>
> I use the stable branch and OS is  Ubuntu 18.04. I think the Dockerfile in
> /gem5/util/dockerfiles/gcn-gpu/  is not latest.
>
> Could you tell me which branch include the latest Dockerfile?
> ___
> gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an
> email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 GCN3 GPU model docker build issue

2021-03-08 Thread Matt Sinclair via gem5-users
Hi,

Can you tell us a bit more about your environment?  For example, what
branch are you using (stable or develop)?  And what OS are you using?

Thanks,
Matt

On Sun, Mar 7, 2021 at 11:11 PM xpf via gem5-users 
wrote:

> Hi all,
>
> I follow the instructions on
> http://www.gem5.org/documentation/general_docs/gpu_models/GCN3. But I
> fail to build docker image,
>
> # build docker
> /gem5/util/dockerfiles/gcn-gpu$ docker build -t gem5-gcn .
>
> Then I got the error:
> Step 4/41 : RUN python -m pip install -U pip
>  ---> Running in 7660ece9ef3a
> Collecting pip
>   Downloading
> https://files.pythonhosted.org/packages/b7/2d/ad02de84a4c9fd3b1958dc9fb72764de1aa2605a9d7e943837be6ad82337/pip-21.0.1.tar.gz
> (1.5MB)
> Complete output from command python setup.py egg_info:
> Traceback (most recent call last):
>   File "", line 1, in 
> ImportError: No module named setuptools
>
> 
> Command "python setup.py egg_info" failed with error code 1 in
> /tmp/pip-build-GB9hfh/pip/
> You are using pip version 8.1.1, however version 21.0.1 is available.
> You should consider upgrading via the 'pip install --upgrade pip' command.
> The command '/bin/sh -c python -m pip install -U pip' returned a non-zero
> code: 1
>
>  Then I add `pip install --upgrade pip` in the Dockerfile, and I also try
> to install setuptools berofe update pip version,  I still get the same
> error. What should I do to fix this problem? Thank you very much.
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: run caffe in Gem5

2021-01-21 Thread Matt Sinclair via gem5-users
Hi Javad,

tl;dr: I don't believe anyone has publicly announced Caffe running
end-to-end in gem5, but I have gotten hipCaffe to run for parts of
applications in the past.

A couple years ago I started working on getting hipCaffe (the HIP version
of Caffe -- HIP is the current GPU programming language used by AMD)
working in gem5.  I used this version in part because NVIDIA GPUs aren't
supported directly in gem5 at the moment (AMD GPUs are), and because I was
interested in running Caffe workloads that used the GPU.  So, long story
short, I don't know if Caffe would "just work" if you wanted to run it in
gem5 only on CPUs or not.  You would need to try this and let us know.

But, in terms of running hipCaffe on the AMD GPU in gem5, I never got it
completely working, but there were certain applications (like CaffeNet) in
it that I could get to run for about 10 GPU kernels or so.  At that point,
I started running into a range of bugs that needed to be fixed -- some
requiring new syscalls to be implemented in SE mode, some requiring various
corner case instructions or sub-cases of instructions to be implemented,
etc.  So, hipCaffe "worked" in the sense that I could run some stuff on
gem5, but it did not work in the sense that I could run an application
end-to-end.  Unfortunately, I never got back to finishing fixing it -- in
part because running hipCaffe in SE mode was an endless series of new bugs
or system features that needed implementing.  My hope is that once the FS
mode support for GPUs is working, that it will make getting workloads like
this running for the GPU model.  Practically speaking (ignoring the bugs),
I think this is necessary anyways, because Caffe runs so many kernels,
simulating all of them will take a really long time.  So we'd want some
support for fast-forwarding and checkpointing, which (I think) is better
supported in FS mode.

If you are interested in trying to run hipCaffe anyways though, my
recommendation would be to start from the GPU documentation and especially
the docker file in public gem5, which installs the version of ROCm and most
of the libraries you'll need on top of it.  From there, you would need to
update the docker to install a version of hipCaffe (
https://github.com/ROCmSoftwarePlatform/hipCaffe/) that is compatible with
ROCm 1.6, as well as install any additional libraries hipCaffe needs that
are not already installed as part of the default gem5 GPU docker.  From
there you would likely be able to get to the same point as me -- getting
some of the kernels to run before a failure happens.

Thanks,
Matt

On Thu, Jan 21, 2021 at 9:36 PM Javad Mozaffari via gem5-users <
gem5-users@gem5.org> wrote:

> Hello,
> i want to run caffe in gem5 but i'm new to gem5, where can i learn about
> this???
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Magic instructions with GCN3 Model/hipcc return 0

2020-11-09 Thread Matt Sinclair via gem5-users
Hi Gabe,

I don't have the broken build in front of me, and it's possible it is
because I'm running on an Ubuntu 16 machine, but I had to add c+11 per the
error message I got when debugging this.  If c++14 works though, great.

Thanks for the updated info -- I built the tutorial out of the old one, so
next time I'll make sure to update it accordingly.

Thanks,
Matt

On Mon, Nov 9, 2020 at 5:44 PM Gabe Black via gem5-users <
gem5-users@gem5.org> wrote:

> BTW, I do think I need to explicitly set the c++ version in the scons
> file, like in Matt's original email above. I'd probably set it to c++14
> though, to be consistent with gem5 proper. I think that will likely fix a
> build issue Bobby had with an older (7.x I think) version of gcc, where the
> default version is probably different from the compiler I'm using (10.x I
> think).
>
> Gabe
>
> On Mon, Nov 9, 2020 at 1:50 PM Gabe Black  wrote:
>
>> Hi folks. If you're using the magic address based version of the gem5
>> ops, then you should call, for instance, m5_exit_addr and not just m5_exit.
>> The "normal" functions are now always the magic instructions which
>> essentially only gem5 CPU models know how to execute. All call mechanisms
>> are built into the library at once now so you can use the same binary on
>> the KVM CPU, native gem5 CPUs, etc.
>>
>> You also should not change the scons files when you build. The old
>> Makefile based setup required tinkering with things to get the build you
>> wanted, but that is no longer necessary. If you need to, that's a bug and
>> we should look into it. The lines you're commenting out just set the
>> default magic address, and that's only there for legacy reasons. You can
>> set the address to use from the command line if you're using the m5
>> utility, or by setting the m5op_addr variable if using the library. You
>> still have to run map_m5_mem to make the magic physical address visible to
>> userspace for the library to work, or otherwise set up a virtual to
>> physical mapping if you were, for instance, running in the kernel which
>> somebody was doing recently.
>>
>> If you try to use a call mechanism that isn't supported by your CPU
>> model, then the behavior will be unpredictable. For x86 on the KVM CPU for
>> example, the special gem5 instructions will do whatever they look like they
>> should do on real hardware. That may be a nop, it may be to generate an
>> undefined instruction exception, etc. If it's a nop, it will just leave
>> whatever is in RAX in RAX.
>>
>> Also, argument values and return values are now handled by a layer which
>> knows and applies the actual ABI rules for a given ISA and for the specific
>> types of the arguments and return value. There should be no reason to
>> change the code which is calling the pseudo instruction to explicitly set
>> RAX, especially if you're using the address based calling mechanism which
>> doesn't go through that path at all.
>>
>> Gabe
>>
>> On Mon, Nov 9, 2020 at 1:06 PM Matt Sinclair via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hi Dan,
>>>
>>> My comment was just a general comment on the m5ops -- I thought you were
>>> using the "old" format for building m5ops and that might have been the
>>> problem.  Sounds like it wasn't.
>>>
>>> I think pushing a fix to develop and tagging Gabe and Jason as reviewers
>>> is probably the right strategy.
>>>
>>> Thanks,
>>> Matt
>>>
>>> On Mon, Nov 9, 2020 at 2:33 PM Daniel Gerzhoy 
>>> wrote:
>>>
>>>> I found the issue and fixed it.
>>>>
>>>> The return value wasn't being put into the Rax register in
>>>> src/arch/x86/isa/decoder/two_byte_opcodes.isa
>>>>
>>>> 0x4: BasicOperate::gem5Op({{
>>>> uint64_t ret;
>>>> bool recognized =
>>>> PseudoInst::pseudoInst(
>>>> xc->tcBase(), IMMEDIATE, ret);
>>>> if (!recognized)
>>>> fault = std::make_shared();
>>>> Rax = ret;
>>>>
>>>> //<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>> }}, IsNonSpeculative);
>>>>
>>>>   This code was simplified with the new abi stuff and the Rax = ret;
>>>> must have been lost in the shuffle.
>>>>
>>>> I could p

[gem5-users] Re: Magic instructions with GCN3 Model/hipcc return 0

2020-11-09 Thread Matt Sinclair via gem5-users
Hi Dan,

My comment was just a general comment on the m5ops -- I thought you were
using the "old" format for building m5ops and that might have been the
problem.  Sounds like it wasn't.

I think pushing a fix to develop and tagging Gabe and Jason as reviewers is
probably the right strategy.

Thanks,
Matt

On Mon, Nov 9, 2020 at 2:33 PM Daniel Gerzhoy 
wrote:

> I found the issue and fixed it.
>
> The return value wasn't being put into the Rax register in
> src/arch/x86/isa/decoder/two_byte_opcodes.isa
>
> 0x4: BasicOperate::gem5Op({{
> uint64_t ret;
> bool recognized = PseudoInst::pseudoInst(
> xc->tcBase(), IMMEDIATE, ret);
> if (!recognized)
> fault = std::make_shared();
> Rax = ret;
>
> //<< }}, IsNonSpeculative);
>
>   This code was simplified with the new abi stuff and the Rax = ret; must
> have been lost in the shuffle.
>
> I could push the fix to develop, or should I just make an issue on Jira?
>
> Best,
>
> Dan
>
> On Mon, Nov 9, 2020 at 2:50 PM Daniel Gerzhoy 
> wrote:
>
>> Let me further say that I know that the magic instructions are being
>> called. I am just getting bogus return values.
>>
>> On Mon, Nov 9, 2020 at 2:18 PM Daniel Gerzhoy 
>> wrote:
>>
>>> Hi Matt,
>>>
>>> Thanks for this, it's very helpful. However after following the
>>> instructions (I had to extrapolate a little because of the directory
>>> structure changes you mentioned) I get the same result: Nill returns from
>>> the magic instructions.
>>> Actually It isn't nill, but a constant no matter what. If I compile my
>>> program with -O0 its nill, if with -O2 its: 4198192, which is suspicious.
>>>
>>> To clarify, are these updated instructions specifically meant to fix
>>> this issue I am running into? Or just general instructions to build m5op.o
>>>
>>> Here are the specific changes I made according to the link you provided,
>>> the supplemental instructions, and extrapolating based on the directory
>>> structure change.
>>>
>>> 1. In SConsopts I commented both:
>>>
>>> --- a/util/m5/src/abi/x86/SConsopts
>>> +++ b/util/m5/src/abi/x86/SConsopts
>>> @@ -27,8 +27,8 @@ Import('*')
>>>
>>>  env['ABI'] = 'x86'
>>>  get_abi_opt('CROSS_COMPILE', '')
>>> -env.Append(CXXFLAGS='-DM5OP_ADDR=0x')
>>> -env.Append(CCFLAGS='-DM5OP_ADDR=0x')
>>> +#env.Append(CXXFLAGS='-DM5OP_ADDR=0x')
>>> +#env.Append(CCFLAGS='-DM5OP_ADDR=0x')
>>>
>>>  env['CALL_TYPE']['inst'].impl('m5op.S', 'verify_inst.cc')
>>>  env['CALL_TYPE']['addr'].impl('m5op_addr.S', default=True)
>>>
>>> 2. In SConstruct I added:
>>>
>>> --- a/util/m5/SConstruct
>>> +++ b/util/m5/SConstruct
>>> @@ -44,7 +44,9 @@ def abspath(d):
>>>
>>>  # Universal settings.
>>>  main.Append(CXXFLAGS=[ '-O2' ])
>>> +main.Append(CXXFLAGS=[ '-std=c++11' ])
>>>  main.Append(CCFLAGS=[ '-O2' ])
>>>  main.Append(CPPPATH=[ common_include ])
>>>
>>> The compilation process compiles m5op.S with gcc though, so c++11
>>> doesn't have any effect on it. Not sure if that matters.
>>>
>>> 3. Finally I linked both m5_mmap.o and m5op.o as per the instructions
>>> but I am a little wary of m5_mmap
>>>
>>> What does m5_mmap actually do if I don't have M5OP_ADDR defined. It
>>> looks like nothing? Do I need to link it?
>>>
>>> *Is there something inside the program I need to do before calling magic
>>> instructions that has to do with m5_mmap?*
>>>
>>> Thanks for your help,
>>>
>>> Dan
>>>
>>> On Mon, Nov 9, 2020 at 12:12 PM Matt Sinclair 
>>> wrote:
>>>
 Hi Dan,

 In recent weeks, Gabe (if I recall correctly) updated how the m5ops are
 created.  I had created a homework assignment for my course about it:
 https://pages.cs.wisc.edu/~sinclair/courses/cs752/fall2020/handouts/hw3.html
 (see #2), but this is now already out of date as the location of some files
 changed.  The updated instructions are:

 1.  Update $GEM5_ROOT/util/m5/SConstruct, add a new line between the
 current lines 46 and 47:

 main.Append(CXXFLAGS=[ '-O2' ])
 *+main.Append(CXXFLAGS=[ '-std=c++11' ])*

 main.Append(CCFLAGS=[ '-O2' ])

 2.  Now run the same command you ran in step 2 of the above link:

 scons build/x86/out/m5

 3.  This will create the same two .o files in step 2 of the above link,
 in the same places (although the location of m5op.o may have changed
 to include/gem5 util/m5/build/x86/abi/x86/ according to some of the
 students in my course).
 Matt

 On Mon, Nov 9, 2020 at 9:25 AM Daniel Gerzhoy via gem5-users <
 gem5-users@gem5.org> wrote:

> Hey all,
>
> I've recently updated to using the dev branch for my GCN3 simulations.
> I've noticed that I am now getting return values of 0 for every magic
> instruction (m5_rpns for instance).
>
> Is there a special way I need to be compiling/linking 

[gem5-users] Re: Magic instructions with GCN3 Model/hipcc return 0

2020-11-09 Thread Matt Sinclair via gem5-users
Hi Dan,

In recent weeks, Gabe (if I recall correctly) updated how the m5ops are
created.  I had created a homework assignment for my course about it:
https://pages.cs.wisc.edu/~sinclair/courses/cs752/fall2020/handouts/hw3.html
(see #2), but this is now already out of date as the location of some files
changed.  The updated instructions are:

1.  Update $GEM5_ROOT/util/m5/SConstruct, add a new line between the
current lines 46 and 47:

main.Append(CXXFLAGS=[ '-O2' ])
*+main.Append(CXXFLAGS=[ '-std=c++11' ])*

main.Append(CCFLAGS=[ '-O2' ])

2.  Now run the same command you ran in step 2 of the above link:

scons build/x86/out/m5

3.  This will create the same two .o files in step 2 of the above link, in
the same places (although the location of m5op.o may have changed to
include/gem5 util/m5/build/x86/abi/x86/ according to some of the students
in my course).
Matt

On Mon, Nov 9, 2020 at 9:25 AM Daniel Gerzhoy via gem5-users <
gem5-users@gem5.org> wrote:

> Hey all,
>
> I've recently updated to using the dev branch for my GCN3 simulations.
> I've noticed that I am now getting return values of 0 for every magic
> instruction (m5_rpns for instance).
>
> Is there a special way I need to be compiling/linking m5ops.S to get the
> return values to show up correctly? Or might this be a bug?
>
> Thanks,
>
> Dan
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 GCN3 GPU model running issues

2020-11-06 Thread Matt Sinclair via gem5-users
Thanks Kyle!  Another good reason for us to get the GCN3 tests up and
running as part of kokoro soon :)

Matt

On Fri, Nov 6, 2020 at 6:35 PM Kyle Roarty  wrote:

> Hi all,
>
> Found the root cause.
> https://gem5-review.googlesource.com/c/public/gem5/+/34160 moved all the
> syscall tables to their own files, but didn't include  in
> those files, so those files couldn't find the SYS_getdents definition. I'll
> post a patch soon.
>
> Re python in the Dockerfile: Like Matt P said, 3.5 is the last python
> version available in Ubuntu 16.04. I'd bet the SConstruct file checks for
> python 3.6 as python 3.5 is end of life. It looks like Python 3.6 is
> available in Ubuntu 16.10, so we can try basing the Dockerfile off that
> instead if it still has our other requirements. The other alternative would
> be to install 3.6+ from another ppa.
>
> Kyle
> --
> *From:* yang...@umich.edu 
> *Sent:* Friday, November 6, 2020 6:26 PM
> *To:* Daniel Gerzhoy 
> *Cc:* Matt Sinclair ; Kyle Roarty <
> kroa...@wisc.edu>; Poremba, Matthew ; gem5 users
> mailing list 
> *Subject:* Re: [gem5-users] Re: gem5 GCN3 GPU model running issues
>
> Hi Daniel,
>
> I tried with your building command and at least it is working fine for me.
>
> Thanks!
> --Yichen
>
> On Fri, Nov 6, 2020 at 6:41 PM Yichen Yang  wrote:
>
> Hi all,
>
> I am using gcn3-gpu docker running on ubuntu16.4 host machine. And follow
> the gem5-resources/readme to build the application.
>
> Thanks!
> --Yichen
>
> On Fri, Nov 6, 2020 at 6:21 PM Poremba, Matthew 
> wrote:
>
> [AMD Public Use]
>
>
>
> Hi Matt,
>
>
>
>
>
> I also see the getdents error with square building gem5 with gcc 7.5.0.
>
>
>
> I would hope we wouldn’t have to define a variable in scons to get a
> syscall to work. I am not sure where is SYS_getdents / SYS_getdents64 are
> supposed to be defined, but it is not anywhere in gem5. Different compiler
> maybe? The change that broke this for me literally just moves files around
> so I have no ideas how that caused it to break.
>
>
>
>
>
> -Matt
>
>
>
> *From:* Matt Sinclair 
> *Sent:* Friday, November 6, 2020 2:55 PM
> *To:* Daniel Gerzhoy 
> *Cc:* Kyle Roarty ; Poremba, Matthew <
> matthew.pore...@amd.com>; Yichen Yang ; gem5 users
> mailing list 
> *Subject:* Re: [gem5-users] Re: gem5 GCN3 GPU model running issues
>
>
>
> [CAUTION: External Email]
>
> Ok, we’re using the same, but haven’t gotten the second error ...
> strange.  Are you using different apps?
>
>
>
> Matt
>
>
>
> On Fri, Nov 6, 2020 at 4:45 PM Daniel Gerzhoy 
> wrote:
>
> I'm using the gcn3 docker, so Ubuntu 16.04 I believe
>
>
>
> On Fri, Nov 6, 2020 at 5:44 PM Matt Sinclair 
> wrote:
>
> Hi Daniel & Yichen,
>
>
>
> What OS are you using?  We have not encountered either of these problems
> thus far ... something must be different about your setup and ours.
>
>
>
> Thanks,
>
> Matt
>
>
>
> On Fri, Nov 6, 2020 at 4:35 PM Daniel Gerzhoy via gem5-users <
> gem5-users@gem5.org> wrote:
>
> For some reason that syscall is only built if you set a flag. Recompile
> the simulator like so:
>
>
>
> scons -j$(nproc) build/GCN3_X86/gem5.opt --ignore-style SLICC_HTML=True
> CCFLAGS_EXTRA="-DSYS_getdents -DSYS_getdents64"
>
>
>
> Cheers,
>
>
>
> Dan
>
>
>
> On Fri, Nov 6, 2020 at 5:25 PM Poremba, Matthew via gem5-users <
> gem5-users@gem5.org> wrote:
>
> [AMD Public Use]
>
>
>
> Looking into that syscall error now.
>
>
>
> I’m not quite sure yet how to fix the docker image since python 3.5 is the
> latest version available for the distro needed.  For now I disabled the
> check for 3.6 since it seems unnecessarily strict and doesn’t break
> anything related to this build.
>
>
>
>
>
> -Matt
>
>
>
> *From:* Yichen Yang 
> *Sent:* Friday, November 6, 2020 1:30 PM
> *To:* Poremba, Matthew 
> *Cc:* gem5 users mailing list 
> *Subject:* Re: [gem5-users] gem5 GCN3 GPU model running issues
>
>
>
> [CAUTION: External Email]
>
> Thanks!
>
>
>
> I tried the develop branch. But running into new problems
>
> warn: ignoring syscall set_robust_list(...)
> warn: ignoring syscall rt_sigaction(...)
>   (further warnings will be suppressed)
> warn: ignoring syscall rt_sigprocmask(...)
>   (further warnings will be suppressed)
> warn: ignoring syscall mprotect(...)
> warn: ignoring syscall mprotect(...)
> fatal: syscall getdents (#78) unimplemented.
> Memory Usage: 1562768 KBytes
>
>
>
> And I think the dockerfile needs some update. The scons requires python3.6
> to compile gem5, to be specific, `python3-config` need python3.6, but the
> default version installed with the docker is 3.5.
>
>
>
> Best, Yichen
>
>
>
>
>
>
>
> On Fri, Nov 6, 2020 at 2:58 PM Poremba, Matthew 
> wrote:
>
> [AMD Public Use]
>
>
>
> Hi Yichen,
>
>
>
>
>
> Based on the changes I see you’ve made, it seems like you are using an
> older version of gem5.  These should all be fixed, including the error you
> are seeing, on the tip of develop.
>
>
>
> Keep in mind GCN3 was not officially part of the gem5 20.1 release, so 

[gem5-users] Re: gem5 GCN3 GPU model running issues

2020-11-06 Thread Matt Sinclair via gem5-users
Ok, we’re using the same, but haven’t gotten the second error ... strange.
Are you using different apps?

Matt

On Fri, Nov 6, 2020 at 4:45 PM Daniel Gerzhoy 
wrote:

> I'm using the gcn3 docker, so Ubuntu 16.04 I believe
>
> On Fri, Nov 6, 2020 at 5:44 PM Matt Sinclair 
> wrote:
>
>> Hi Daniel & Yichen,
>>
>> What OS are you using?  We have not encountered either of these problems
>> thus far ... something must be different about your setup and ours.
>>
>> Thanks,
>> Matt
>>
>> On Fri, Nov 6, 2020 at 4:35 PM Daniel Gerzhoy via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> For some reason that syscall is only built if you set a flag. Recompile
>>> the simulator like so:
>>>
>>> scons -j$(nproc) build/GCN3_X86/gem5.opt --ignore-style SLICC_HTML=True
>>> CCFLAGS_EXTRA="-DSYS_getdents -DSYS_getdents64"
>>>
>>> Cheers,
>>>
>>> Dan
>>>
>>> On Fri, Nov 6, 2020 at 5:25 PM Poremba, Matthew via gem5-users <
>>> gem5-users@gem5.org> wrote:
>>>
 [AMD Public Use]



 Looking into that syscall error now.



 I’m not quite sure yet how to fix the docker image since python 3.5 is
 the latest version available for the distro needed.  For now I disabled the
 check for 3.6 since it seems unnecessarily strict and doesn’t break
 anything related to this build.





 -Matt



 *From:* Yichen Yang 
 *Sent:* Friday, November 6, 2020 1:30 PM
 *To:* Poremba, Matthew 
 *Cc:* gem5 users mailing list 
 *Subject:* Re: [gem5-users] gem5 GCN3 GPU model running issues



 [CAUTION: External Email]

 Thanks!



 I tried the develop branch. But running into new problems

 warn: ignoring syscall set_robust_list(...)
 warn: ignoring syscall rt_sigaction(...)
   (further warnings will be suppressed)
 warn: ignoring syscall rt_sigprocmask(...)
   (further warnings will be suppressed)
 warn: ignoring syscall mprotect(...)
 warn: ignoring syscall mprotect(...)
 fatal: syscall getdents (#78) unimplemented.
 Memory Usage: 1562768 KBytes



 And I think the dockerfile needs some update. The scons requires
 python3.6 to compile gem5, to be specific, `python3-config` need python3.6,
 but the default version installed with the docker is 3.5.



 Best, Yichen







 On Fri, Nov 6, 2020 at 2:58 PM Poremba, Matthew <
 matthew.pore...@amd.com> wrote:

 [AMD Public Use]



 Hi Yichen,





 Based on the changes I see you’ve made, it seems like you are using an
 older version of gem5.  These should all be fixed, including the error you
 are seeing, on the tip of develop.



 Keep in mind GCN3 was not officially part of the gem5 20.1 release, so
 the most up to date version is on the develop branch until the next gem5
 release.





 -Matt



 *From:* Yichen Yang via gem5-users 
 *Sent:* Friday, November 6, 2020 11:34 AM
 *To:* gem5-users@gem5.org
 *Cc:* Yichen Yang 
 *Subject:* [gem5-users] gem5 GCN3 GPU model running issues



 [CAUTION: External Email]

 Hi,



 I was trying to run gem5 with its GCN3 GPU model following the
 instructions on
 https://www.gem5.org/documentation/general_docs/gpu_models/GCN3
 
 .



 I fixed some bugs in the code but still cannot run the example. I
 attached commands and bugs I fixed below.



 The simulator launched and running into this problem:

 Program Started!
 info: running on device
 info: architecture on AMD GPU device is: 801
 info: allocate host mem (  7.63 MB)
 info: launch 'vector_square' kernel
 panic: panic condition availableTokens > maxTokens occurred: More
 tokens available than the maximum after recvTokens!
 Memory Usage: 1737788 KBytes
 Program aborted at tick 137231963000



 Is there anything I did incorrectly?



 Thanks!

 Best, Yichen



 To be specific, I use the following command:

 ## build docker
 docker build -t gcn3-test gem5/util/dockerfiles/gcn-gpu
 ## make gem5
 docker run --rm -v $PWD/gem5:/gem5 -w /gem5 gcn3-test scons -sQ
 -j$(nproc) build/GCN3_X86/gem5.opt
 ## make application
 docker run --rm -v $PWD/gem5-resources:/gem5-resources -w
 /gem5-resources -u 

[gem5-users] Re: gem5 GCN3 GPU model running issues

2020-11-06 Thread Matt Sinclair via gem5-users
Hi Daniel & Yichen,

What OS are you using?  We have not encountered either of these problems
thus far ... something must be different about your setup and ours.

Thanks,
Matt

On Fri, Nov 6, 2020 at 4:35 PM Daniel Gerzhoy via gem5-users <
gem5-users@gem5.org> wrote:

> For some reason that syscall is only built if you set a flag. Recompile
> the simulator like so:
>
> scons -j$(nproc) build/GCN3_X86/gem5.opt --ignore-style SLICC_HTML=True
> CCFLAGS_EXTRA="-DSYS_getdents -DSYS_getdents64"
>
> Cheers,
>
> Dan
>
> On Fri, Nov 6, 2020 at 5:25 PM Poremba, Matthew via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> [AMD Public Use]
>>
>>
>>
>> Looking into that syscall error now.
>>
>>
>>
>> I’m not quite sure yet how to fix the docker image since python 3.5 is
>> the latest version available for the distro needed.  For now I disabled the
>> check for 3.6 since it seems unnecessarily strict and doesn’t break
>> anything related to this build.
>>
>>
>>
>>
>>
>> -Matt
>>
>>
>>
>> *From:* Yichen Yang 
>> *Sent:* Friday, November 6, 2020 1:30 PM
>> *To:* Poremba, Matthew 
>> *Cc:* gem5 users mailing list 
>> *Subject:* Re: [gem5-users] gem5 GCN3 GPU model running issues
>>
>>
>>
>> [CAUTION: External Email]
>>
>> Thanks!
>>
>>
>>
>> I tried the develop branch. But running into new problems
>>
>> warn: ignoring syscall set_robust_list(...)
>> warn: ignoring syscall rt_sigaction(...)
>>   (further warnings will be suppressed)
>> warn: ignoring syscall rt_sigprocmask(...)
>>   (further warnings will be suppressed)
>> warn: ignoring syscall mprotect(...)
>> warn: ignoring syscall mprotect(...)
>> fatal: syscall getdents (#78) unimplemented.
>> Memory Usage: 1562768 KBytes
>>
>>
>>
>> And I think the dockerfile needs some update. The scons requires
>> python3.6 to compile gem5, to be specific, `python3-config` need python3.6,
>> but the default version installed with the docker is 3.5.
>>
>>
>>
>> Best, Yichen
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Nov 6, 2020 at 2:58 PM Poremba, Matthew 
>> wrote:
>>
>> [AMD Public Use]
>>
>>
>>
>> Hi Yichen,
>>
>>
>>
>>
>>
>> Based on the changes I see you’ve made, it seems like you are using an
>> older version of gem5.  These should all be fixed, including the error you
>> are seeing, on the tip of develop.
>>
>>
>>
>> Keep in mind GCN3 was not officially part of the gem5 20.1 release, so
>> the most up to date version is on the develop branch until the next gem5
>> release.
>>
>>
>>
>>
>>
>> -Matt
>>
>>
>>
>> *From:* Yichen Yang via gem5-users 
>> *Sent:* Friday, November 6, 2020 11:34 AM
>> *To:* gem5-users@gem5.org
>> *Cc:* Yichen Yang 
>> *Subject:* [gem5-users] gem5 GCN3 GPU model running issues
>>
>>
>>
>> [CAUTION: External Email]
>>
>> Hi,
>>
>>
>>
>> I was trying to run gem5 with its GCN3 GPU model following the
>> instructions on
>> https://www.gem5.org/documentation/general_docs/gpu_models/GCN3
>> 
>> .
>>
>>
>>
>> I fixed some bugs in the code but still cannot run the example. I
>> attached commands and bugs I fixed below.
>>
>>
>>
>> The simulator launched and running into this problem:
>>
>> Program Started!
>> info: running on device
>> info: architecture on AMD GPU device is: 801
>> info: allocate host mem (  7.63 MB)
>> info: launch 'vector_square' kernel
>> panic: panic condition availableTokens > maxTokens occurred: More tokens
>> available than the maximum after recvTokens!
>> Memory Usage: 1737788 KBytes
>> Program aborted at tick 137231963000
>>
>>
>>
>> Is there anything I did incorrectly?
>>
>>
>>
>> Thanks!
>>
>> Best, Yichen
>>
>>
>>
>> To be specific, I use the following command:
>>
>> ## build docker
>> docker build -t gcn3-test gem5/util/dockerfiles/gcn-gpu
>> ## make gem5
>> docker run --rm -v $PWD/gem5:/gem5 -w /gem5 gcn3-test scons -sQ
>> -j$(nproc) build/GCN3_X86/gem5.opt
>> ## make application
>> docker run --rm -v $PWD/gem5-resources:/gem5-resources -w /gem5-resources
>> -u $UID:$GID \
>> gcr.io/gem5-test/gcn-gpu
>> 
>>  make
>> gfx8-apu -C /gem5-resources/src/square
>> ## run gem5
>> docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources
>> \
>> -w /gem5 gcn3-test \
>> build/GCN3_X86/gem5.opt 

[gem5-users] Re: track the write syscall in the kernel

2020-10-24 Thread Matt Sinclair via gem5-users
Assuming you are asking about SE mode, I think this is what you are looking
for:
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/sim/syscall_emul.hh#2412
?

Matt

On Sat, Oct 24, 2020 at 4:00 PM ABD ALRHMAN ABO ALKHEEL via gem5-users <
gem5-users@gem5.org> wrote:

> Hi All;
>
> I want to track the write syscall in order to do some calculations on the
> written bytes. I just want to know what is the function that does write
> syscall function and how I can the written bytes from it.
>
> Any help would be appreciated.
>
> Thanks
>
> https://github.com/torvalds/linux/blob/master/fs/read_write.c
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Out of Memory while running GPU Benchmark

2020-09-14 Thread Matt Sinclair via gem5-users
I believe this error is happening because your simulated memory space
(i.e., in the simulator) is not big enough for the application you are
running.  You mentioned that you were passing in 2GB.  My guess is that you
want mem_size here:
https://gem5.googlesource.com/amd/gem5/+/refs/heads/agutierr/master-gcn3-staging/configs/common/Options.py#87
(which is subsequently used here:
https://gem5.googlesource.com/amd/gem5/+/refs/heads/agutierr/master-gcn3-staging/configs/example/apu_se.py#406)
to be larger, large enough to fit whatever the benchmark needs (it sounds
like 4 GB should be enough).

I'm not sure where you were seeing 29 GB from (perhaps with top?), but I
suspect that is how much memory the simulator is consuming when running.

Tony, CC'd, knows more about where the GPU memory is being utilized though,
so Tony please correct me if I missed something.

Matt

On Mon, Sep 14, 2020 at 2:56 PM Muhammet Abdullah Soytürk via gem5-users <
gem5-users@gem5.org> wrote:

> Not sure actually. I ran into the same problem while trying a cpu
> benchmark a while back. Maybe others can explain the reason.
>
> Sampad Mohapatra , 14 Eyl 2020 Pzt, 22:50 tarihinde şunu
> yazdı:
>
>> Hi Muhammet,
>>
>> Yes, the gpu benchmark itself mallocs around 3GB.
>> But then why does the memory usage show 30305260 KBytes ~ 29 GB ?
>> What does this value indicate ?
>>
>> Thank you,
>> Sampad
>>
>> On Mon, Sep 14, 2020 at 3:30 PM Muhammet Abdullah Soytürk <
>> muhammetabdullahsoyt...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there any chance that the input you provide is bigger than 2GB? If
>>> the input size is bigger than the memory size, you cannot simulate it in SE
>>> mode (since there is no paging support for SE mode). As you can understand
>>> from the error message, you need to increase the size of the memory.
>>>
>>> Best,
>>> Muhammet
>>>
>>> Sampad Mohapatra via gem5-users , 14 Eyl 2020 Pzt,
>>> 21:49 tarihinde şunu yazdı:
>>>
 Hi All,

 I am running 2DConvolution (polybench-gpu) and leela (SPEC 17) using
 the AMD GCN3 model on a research cluster with around 4 TB of memory. But
 the simulation ended with the following message:

 fatal: Out of memory, please increase size of physical memory.
 Memory Usage: 30305260 KBytes

 I had passed 2GB as mem-size. What could be the problem and how can I
 mitigate it?

 Thank you,
 Sampad Mohapatra
 ___
 gem5-users mailing list -- gem5-users@gem5.org
 To unsubscribe send an email to gem5-users-le...@gem5.org
 %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>>
>>> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: AMD GCN3 - X86KvmCPU usage - Segfault encountered

2020-09-07 Thread Matt Sinclair via gem5-users
Matt P (CC'd) will likely know better than me, but I don't believe
KVM/fast-forwarding works with GCN3 yet.

Matt

On Mon, Sep 7, 2020 at 9:36 AM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hi All,
>
> I am using the staging branch GCN3. While using the KvmCPU to fast forward
> execution till my
> GPU kernel launches using m5_switch_cpu(), I am encountering a segfault at
> the following location:
>
> src/cpu/kvm/vm.cc:562 : long KvmVM::allocVCPUID() { return nextVCPUID++; }
>
> For some reason the memory location for nextVCPUID is inaccessible.
>
> Does the fast forwarding functionality work with the GCN3 model ?
> If yes, then what could be wrong ?
>
>
> Thanks and regards,
> Sampad Mohapatra
>
>
> 
>  Virus-free.
> www.avast.com
> 
> <#m_-8667493547968458291_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3 docker file missing

2020-09-01 Thread Matt Sinclair via gem5-users
Hi Samaksh,

The warnings you mentioned can be ignored.  They are highlighting that the
sched_yield syscall is not implemented in SE mode.  Which is fine, it's not
needed for correctly simulating this program on the GPU.

I'm not quite sure what the other issue is though.  It sounds like you are
saying you just needed to run "make", not "make square" in addition to it?

Thanks,
Matt

On Tue, Sep 1, 2020 at 4:23 AM Samaksh Sethi via gem5-users <
gem5-users@gem5.org> wrote:

> That error was arising because when the makefile wasn't running properly I
> was trying out alternate methods on my own, but now I'm using the makefile
> again since the patch was pushed, but it's still not working.
> I myself don't know why I have to use make and make square both myself,
> but the only way I can explain it is that the "make" command makes square.o
> file and then make square has a different function entirely.
>
> I was following this (https://youtu.be/HhLiMrjqCvA), which needs me to do
> the "make square" step.
> I was also referring to
> http://www.gem5.org/documentation/general_docs/gpu_models/GCN3 and
> http://www.gem5.org/2020/05/27/modern-gpu-applications.html regularly.
>
> I think I got it working, i used the "make" command to make the square.o
> file and then I run this command (gotten from
> http://www.gem5.org/documentation/general_docs/gpu_models/GCN3 , with
> edits) "$build/GCN3_X86/gem5.opt configs/example/apu_se.py -n2
> --benchmark-root=/gem5-resources/src/square/bin -c square.o" inside the
> bash of the docker and it gave the passed signal and the stats.txt is
> filled now (although there were too many warnings for "ignoring syscall
> sched_yield(...)", which I'm unsure of what its use is)
> I'm still unsure of what this command did differently to the one in
> makefile, that calls hip_runtime.h. But I would suggest updating the
> makefile with this command.
>
> Thanks,
> Samaksh
>
>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3 docker file missing

2020-09-01 Thread Matt Sinclair via gem5-users
This appears to be the same error Dan mentioned previously, where you are
using gcc instead of hipcc.  Did you try applying the fix he suggested
there?

Having said that, the first few lines appear to be making square already,
so I'm not sure why you are trying to make it again?

Matt

On Tue, Sep 1, 2020 at 3:13 AM Samaksh Sethi via gem5-users <
gem5-users@gem5.org> wrote:

> Hi Matt,
> I had originally sent that message before the changes were implemented,
> so, now the "make" command seems to work (I'm not completely sure from the
> output) but "make square" is still stuck where it was before (As previously
> posted before at
> https://www.mail-archive.com/gem5-users@gem5.org/msg18296.html)
>
> $ docker run -it -v $PWD/gem5-resources:/gem5-resources -v $PWD/gem5:/gem5
> -w /gem5-resources/src/square gcn-gpu bash
> root@8ea50cb89263:/gem5-resources/src/square# make
> mkdir -p ./bin
> /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801  square.cpp -o
> ./bin/square.o
> objdump: './bin/square.o': No such file
> root@8ea50cb89263:/gem5-resources/src/square# make
> make: Nothing to be done for 'gfx8-apu'.
> root@8ea50cb89263:/gem5-resources/src/square# make square
> g++ square.cpp   -o square
> square.cpp:24:29: fatal error: hip/hip_runtime.h: No such file or directory
> compilation terminated.
> : recipe for target 'square' failed
> make: *** [square] Error 1
>
> I can't find hip_runtime.h file manually too, so I think that might be
> missing from the repository
>
> Thanks,
> Samaksh
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3 docker file missing

2020-08-31 Thread Matt Sinclair via gem5-users
Hi Samaksh,

Is this stuff you tried before or after the message Bobby sent?

Thanks,
Matt

On Mon, Aug 31, 2020 at 2:09 PM Samaksh Sethi via gem5-users <
gem5-users@gem5.org> wrote:

> Ok so, doing this, on the first run, square.o is not created properly, but
> that error goes away on the 2nd run, and now there's a new error, which I
> have no clue about how to fix ("make gfx8-apu" gives the same error too),
> as it's probably something in the docker file. Hope I'm not missing
> anything here.
>
> $ docker run -it -v $PWD/gem5-resources:/gem5-resources -v $PWD/gem5:/gem5
> -w /gem5-resources/src/square gcn-docker bash
>
> root@36f387b74815:/gem5-resources/src/square# make
> mkdir -p ./bin
> /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801  square.cpp -o
> ./bin/square.o
> objdump: './bin/square.o': No such file
>
> root@36f387b74815:/gem5-resources/src/square# make
> /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801  square.cpp -o
> ./bin/square.o
> /opt/rocm/hcc-1.0/compiler/bin/llvm-link:
> /tmp/tmp.yjaznAvFJZ/square.kernel.bc:1:1: error: expected top-level entity
>
> __CLANG_OFFLOAD_BUNDLE__�host-x86_64-unknown-linux�8%hcc-amdgcn--amdhsa-gfx801�%8%hcc-amdgcn--amdhsa-gfx801ELF@
> �@x"@@
> @@dd   ``   `R�td
> `Q�td\AMDMAMDAMDGPU
> /opt/rocm/hcc-1.0/compiler/bin/llvm-link: error loading file
> '/tmp/tmp.yjaznAvFJZ/square.kernel.bc'
> ld: warning: Cannot create .eh_frame_hdr section, --eh-frame-hdr ignored.
> ld: error in /tmp/tmp.yjaznAvFJZ/square.host.o(.eh_frame); no
> .eh_frame_hdr table will be created.
>
> Also, just a doubt, about the --rm docker option, what exactly does that
> remove, the docker image I created/compiled with docker build -t
>  (approx 3.1GB) or just this instance of
> the image? Because if it deletes the compiled image, then it would be
> really time consuming to recompile it everytime I need to test a change
> (eg-running make square multiple times).
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3 docker file missing

2020-08-31 Thread Matt Sinclair via gem5-users
Kyle, can you please fix this (the Makefile) for square?  Or update the
instructions in the way Dan described above?

Matt

On Mon, Aug 31, 2020 at 11:51 AM Daniel Gerzhoy via gem5-users <
gem5-users@gem5.org> wrote:

> Looks like that command needs to be updated, or the makefile.
>
> Try the command without "square" at the end (" docker run --rm -v
> $PWD/gem5-resources:/gem5-resources -w /gem5-resources/src/square 
> make ")
> If you look in the makefile the first rule is "gfx8-apu" not "square" so
> you could also change the command to be "make gfx8-apu"
>
> You are failing to compile with g++ because you need to use hipcc (the hip
> compiler), which if you look at the makefile it is doing for you (along
> with the important --amdgpu-target=gfx801 flag.
>
> Hope this helps,
>
> Dan
>
> On Mon, Aug 31, 2020 at 11:46 AM Samaksh Sethi via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> Ok so I forgot to mention that the gem5-resources repository (
>> https://gem5.googlesource.com/public/gem5-resources/) doesn't have the
>> makefile in the main folder, it's actually in gem5-resources>>src>>square,
>> so I can't just run the command (docker run --rm -v
>> $PWD/gem5-resources:/gem5-resources -w /gem5-resources/src/square 
>> 
>> make square) directly or it just says ""make: *** No rule to make target
>> 'square'.  Stop.""
>>
>> About square.out, I'm unable to make that too directly with gcc/g++
>> because of the error that I'm getting which is the same as when I run docker
>> run --rm -v $PWD/gem5-resources:/gem5-resources -w
>> /gem5-resources/src/square  make square ""square.cpp:24:29:
>> fatal error: hip/hip_runtime.h: No such file or directory compilation
>> terminated."" I can't find the file myself too, I did look in the makefile
>> folder (gem5-resources>>src>>square), I couldn't find the sources.mk
>> file and so I couldn't find any $INCLUDE tag, so I couldn't understand
>> where to look.
>>
>> Lastly, about docker files, I tried using the method you suggested with
>> docker run -it  bash, let me know if there's a better method or
>> any other options I should also be using. Also, about the --rm docker
>> option, what exactly does that remove, the docker image I created/compiled
>> with docker build -t  (approx 3.1GB) or just this instance of
>> the image? because if it deletes the compiled image, then it would be
>> really time consuming to recompile it everytime I need to test a change
>> (eg-running make square multiple times). Also let me know if I'm using the
>> terms image/container/instance correctly or not.
>> ___
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: AMD GCN3 - Virtual network type correctness in MOESI_AMD_Base-dir.sm

2020-08-30 Thread Matt Sinclair via gem5-users
Hi Sampad,

If possible, can you please submit a patch for this?  That way Srikant and
the others who are experts with Garnet can review and validate.

Thanks,
Matt

On Sun, Aug 30, 2020 at 10:37 PM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hi Srikant,
>
> It is used to send both data and acks.
> For now I am changing it to response type till a counter argument is
> presented.
>
> Thanks,
> Sampad
>
>
> 
>  Virus-free.
> www.avast.com
> 
> <#m_-3382899564790013684_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> On Sun, Aug 30, 2020 at 10:37 PM Srikant Bharadwaj via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> Hi Sampad,
>> The vnet_type 'request' and 'response' in vnet is consumed by Garnet for
>> setting the message size. In general, if a message has data that will be
>> transmitted it should be marked as a 'response' type. I am not sure about
>> the GPU_VIPER protocol, but if both the message buffers in question carry a
>> data type they should be marked as a 'response' type.
>>
>> Thanks,
>> Srikant
>>
>> On Sun, Aug 30, 2020 at 6:49 PM Sampad Mohapatra via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hi All,
>>>
>>> The *vnet_type* of the MessageBuffer *responseToDMA* is set as *request*
>>> and
>>> the virtual network number is set as 3.
>>>
>>> MessageBuffer * responseToDMA, network="To", virtual_network="3",
>>> vnet_type="request";
>>>
>>> But in other slicc files such as *GPU_VIPER_TCC.sm* the vnet_type of vn
>>> number 3 is set as response.
>>>
>>> MessageBuffer * responseToCore, network="To", virtual_network="3",
>>> vnet_type="response";
>>>
>>> Shouldn't the vnet_type of responseToDMA be "response" ?
>>>
>>> Thanks and regards,
>>> Sampad Mohapatra
>>>
>>>
>>> 
>>>  Virus-free.
>>> www.avast.com
>>> 
>>> <#m_-3382899564790013684_m_-8833927310015265666_m_6814462151260302366_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>> ___
>>> gem5-users mailing list -- gem5-users@gem5.org
>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>> ___
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3 docker file missing

2020-08-30 Thread Matt Sinclair via gem5-users
Dan or Kyle can confirm, but yes I believe that is what others are doing.

If you look through the posted text from running square, you have a fatal
error because of it being able to access gem5-resources.  Kyle, have you
seen this before?

Matt

On Sun, Aug 30, 2020 at 5:43 PM Samaksh Sethi via gem5-users <
gem5-users@gem5.org> wrote:

> Okay, yeah that does make sense to me.
> So what i understand is, I used the develop branch for the dockerfile and
> then I run everything as usual from the staging branch
>
> Ok, so that works, Thanks a lot!!
> I was able to get the build command running,
>
> Just one last thing, how do I confirm everything worked ok?
> I ran this command for square (changed a bit, from
> http://www.gem5.org/documentation/general_docs/gpu_models/GCN3)
> *docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources
> -w /gem5 gcn-docker build/GCN3_X86/gem5.opt configs/example/apu_se.py -n2
> --benchmark-root=/gem5-resources/src/square -c square.cpp*
> warn: system.ruby.network adopting orphan SimObject param 'int_links'
> warn: system.ruby.network adopting orphan SimObject param 'ext_links'
> warn: DRAM device capacity (8192 Mbytes) does not match the address range
> assigned (512 Mbytes)
> gem5 Simulator System.  http://gem5.org
> gem5 is copyrighted software; use the --copyright option for details.
>
> gem5 compiled Aug 30 2020 22:11:25
> gem5 started Aug 30 2020 22:27:07
> gem5 executing on d154d77a3679, pid 1
> command line: build/GCN3_X86/gem5.opt configs/example/apu_se.py -n2
> --benchmark-root=/gem5-resources/src/square -c square.cpp
>
> info: Standard input is not a terminal, disabling listeners.
> Num SQC =  1 Num scalar caches =  1 Num CU =  4
> Global frequency set at 1 ticks per second
> fatal: Can't load object file /gem5-resources/src/square/square.cpp
> Memory Usage: 1182408 KBytes
>
> Now what? There's nothing in the stats.txt, is there any other command I
> need to run too?
>
>
>
>
> ___
>
> gem5-users mailing list -- gem5-users@gem5.org
>
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

-- 
Regards,
Matt Sinclair
Assistant Professor
University of Wisconsin-Madison
Computer Sciences Department
cs.wisc.edu/~sinclair
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3 docker file missing

2020-08-30 Thread Matt Sinclair via gem5-users
Ok, we can try to make it clear that you should be looking at the develop
branch.

I thought the docker was pointing to the GCN3 staging branch, despite being
on the develop branch, but that error is likely what this pending patch is
fixing: https://gem5-review.googlesource.com/c/public/gem5/+/33655.  Since
the GCN3 staging branch is in the process of being merged into develop, not
everything is working yet.  Thus, like Dan G. mentioned to you previously,
I recommend pointing the Docker to the staging branch for now.

Matt

On Sun, Aug 30, 2020 at 1:27 PM Samaksh Sethi via gem5-users <
gem5-users@gem5.org> wrote:

> Thanks!
> That itself was my issue, I didn't understand from the documentation that
> I had to clone the develop branch, I was just using the master branch!
>
> But I'm still getting errors just running build commands directly from the
> documentation
> https://youtu.be/HhLiMrjqCvA - This is the guide I'm following
>
> *1st run - *
> *docker run --rm -v $PWD/gem5:/gem5 -w /gem5 gcn-docker scons -sQ -j4
> build/GCN3_X86/gem5.opt*
> Warning: Your compiler doesn't support incremental linking and lto at the
> same
>  time, so lto is being disabled. To force lto on anyway, use the
>  --force-lto option. That will disable partial linking.
> Info: Using Python config: python2.7-config
> Checking for hdf5-serial using pkg-config... no
> Checking for hdf5 using pkg-config...Package hdf5-serial was not found in
> the pkg-config search path.
> Perhaps you should add the directory containing `hdf5-serial.pc'
> to the PKG_CONFIG_PATH environment variable
> No package 'hdf5-serial' found
>  no
> Package hdf5 was not found in the pkg-config search path.
> Perhaps you should add the directory containing `hdf5.pc'
> to the PKG_CONFIG_PATH environment variable
> No package 'hdf5' found
> Warning: Couldn't find any HDF5 C++ libraries. Disabling
>  HDF5 support.
> Variables file /gem5/build/variables/GCN3_X86 not found,
>   using defaults in /gem5/build_opts/GCN3_X86
> MOESI_AMD_Base-dir.sm:220: Warning: Non-void return ignored, return type
> is 'bool'
> MOESI_AMD_Base-dir.sm:1034: Warning: Non-void return ignored, return type
> is 'Tick'
> MOESI_AMD_Base-dir.sm:1038: Warning: Non-void return ignored, return type
> is 'Tick'
> MOESI_AMD_Base-dir.sm:1042: Warning: Non-void return ignored, return type
> is 'Tick'
> MOESI_AMD_Base-dir.sm:1046: Warning: Non-void return ignored, return type
> is 'Tick'
> MOESI_AMD_Base-dir.sm:1050: Warning: Non-void return ignored, return type
> is 'Tick'
> MOESI_AMD_Base-dir.sm:1054: Warning: Non-void return ignored, return type
> is 'Tick'
> MOESI_AMD_Base-dir.sm:1058: Warning: Non-void return ignored, return type
> is 'Tick'
> MOESI_AMD_Base-dir.sm:586: Warning: Unused action: l_queueMemWBReq, Write
> WB data to memory
> MOESI_AMD_Base-dir.sm:941: Warning: Unused action:
> mwc_markSinkWriteCancel, Mark to sink impending VicDirty
> MOESI_AMD_Base-dir.sm:1033: Warning: Unused action: dl_deallocateL3,
> deallocate the L3 block
> MOESI_AMD_Base-dir.sm:1069: Warning: Unused action:
> yy_recycleResponseQueue, recycle response queue
> MOESI_AMD_Base-dma.sm:187: Warning: Non-void return ignored, return type
> is 'Tick'
> MOESI_AMD_Base-dma.sm:191: Warning: Non-void return ignored, return type
> is 'Tick'
> MOESI_AMD_Base-CorePair.sm:325: Warning: Non-void return ignored, return
> type is 'bool'
> MOESI_AMD_Base-CorePair.sm:802: Warning: Non-void return ignored, return
> type is 'Tick'
> MOESI_AMD_Base-CorePair.sm:806: Warning: Non-void return ignored, return
> type is 'Tick'
> MOESI_AMD_Base-CorePair.sm:810: Warning: Non-void return ignored, return
> type is 'Tick'
> MOESI_AMD_Base-CorePair.sm:814: Warning: Non-void return ignored, return
> type is 'Tick'
> MOESI_AMD_Base-CorePair.sm:1270: Warning: Non-void return ignored, return
> type is 'Scalar'
> MOESI_AMD_Base-CorePair.sm:1274: Warning: Non-void return ignored, return
> type is 'Scalar'
> MOESI_AMD_Base-CorePair.sm:1278: Warning: Non-void return ignored, return
> type is 'Scalar'
> MOESI_AMD_Base-CorePair.sm:1282: Warning: Non-void return ignored, return
> type is 'Scalar'
> GPU_VIPER-TCP.sm:166: Warning: Non-void return ignored, return type is
> 'bool'
> GPU_VIPER-TCP.sm:451: Warning: Non-void return ignored, return type is
> 'Tick'
> GPU_VIPER-TCP.sm:455: Warning: Non-void return ignored, return type is
> 'Tick'
> GPU_VIPER-TCP.sm:532: Warning: Non-void return ignored, return type is
> 'Scalar'
> GPU_VIPER-TCP.sm:536: Warning: Non-void return ignored, return type is
> 'Scalar'
> GPU_VIPER-TCP.sm:385: Warning: Unused action: norl_issueRdBlkOrloadDone,
> local load done
> GPU_VIPER-SQC.sm:143: Warning: Non-void return ignored, return type is
> 'bool'
> GPU_VIPER-SQC.sm:275: Warning: Non-void return ignored, return type is
> 'Tick'
> GPU_VIPER-SQC.sm:279: Warning: Non-void return ignored, return type is
> 'Tick'
> GPU_VIPER-TCC.sm:168: Warning: Non-void return ignored, return type is
> 'bool'
> 

[gem5-users] Re: GCN3 docker file missing

2020-08-30 Thread Matt Sinclair via gem5-users
Can you please provide us with some additional information about how you
are attempting to run it?  For example, what branch are you using?

Looks like there is an extra 'o' in that link, thanks -- Kyle can you
please fix this?

To the best of our knowledge, the Docker is working, so I suspect there is
a problem with how you are using it, but will need more information before
we can help.  My guess is you are looking at an incorrect branch, because I
can find the Docker, e.g., here:
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/util/dockerfiles/gcn-gpu/.
I believe only the develop branch has the docker right now, and yes the AMD
GCN3 branch does not have it because we checked it into develop instead
since the AMD GCN3 branch is currently being merged into develop.

Matt

On Sun, Aug 30, 2020 at 7:25 AM samakshsethi.ss--- via gem5-users <
gem5-users@gem5.org> wrote:

> I've been trying to run GCN3 for the past few days, but there's some
> errors rising up.
> I've tried multiple methods
> 1. Docker Method
> http://www.gem5.org/documentation/general_docs/gpu_models/GCN3
> Here, the repository link is broken, but if I use the working link (
> https://gem5.googlesource.com/public/gem5/) The repository doesn't have
> the util/dockerfiles/gcn-gpu/ subdirectory. Even the amd/gem5 branch of the
> gem5 repo doesn't have the docker files.
> 2. Direct Method
> http://www.m5sim.org/GPU_Models
> Here the first command does not work, scons -sQ -jN
> ./build/GCN3_X86/gem5.opt
> I kept N as 4, but there is this error
>
> *** Error loading site_init file './site_scons/site_init.py':
>   File "./site_scons/site_init.py", line 52
>
> except SystemExit, e:
>
>  ^
>
> SyntaxError: invalid syntax
>
> There shouldn't be an error in scons because I have used gem5 successfully
> before.
>
> I'm a real beginner in gem5 so I might be missing something.
> Any help would be appreciated.
> Thanks!!
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3 - SLICC - GPU_VIPER-TCC.sm and GPU_TCP-TCP.sm Correctness

2020-08-25 Thread Matt Sinclair via gem5-users
Hi Sampad,

I believe this relates to the fact that the coherence protocol is not
actually sending out the data, but instead gem5 uses the backing store to
functionally read/write data.  Essentially, in a real system, yes we would
need to send the data, which is why the message size accounts for this.
But in a simulator we don't pass the data along to simplify things (i.e.,
we don't send the data, but we do account for it).

However, Alex and Brad (CC'd) know a lot more about the state of the
backing store than I do, so they should comment to confirm.

Matt

On Tue, Aug 25, 2020 at 10:23 AM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hello,
>
> In GPU_VIPER-TCC.sm and GPU_TCP-TCP.sm, in action *at_atomicThrough*
> present in both files,
> the out message Type is *CoherenceRequestType:Atomic* and the message
> size is *MessageSizeType:Data*,
> but there is *no data* being sent. Is this correct behaviour ?
>
> Either data is not being sent or the message size should be some other
> type.
> Please advise.
>
> Thank you,
> Sampad Mohapatra
>
>
> 
>  Virus-free.
> www.avast.com
> 
> <#m_7456722988425397407_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3/hip constant memory

2020-08-18 Thread Matt Sinclair via gem5-users
Hi Dan,

Attempting to answers your questions in order:

- Yes, by data cache I meant a global memory array.  You've highlighted the
issue exactly though, by making the array a global memory array instead, it
will now be subject to thrashing with other global memory data.  You could
try experimenting on a real GPU though to see what the performance hit is.
That might help you decide if it's worth adding the support.

- I am not an expert in the SQC (Tony and Brad, CC'd, are) but I believe it
only is used for a) instructions and b) scalars.  I am not aware of a way
to put non-scalar data in it.

- I'm not sure if shared memory here is referring to the traditional shared
memory like in CPUs (e.g., the global memory) or what NVIDIA refers to as
shared memory (e.g., the per-CU scratchpads)?  If it's the traditional
definition of shared memory, then it resides in the standard main memory
place, and flows through the caches to the cores.  If you meant the
scratchpads, AMD refers to those as local data stores (LDSs), and they are
co-located with the CUs (e.g., Figure 2 in
https://ieeexplore.ieee.org/document/8327041).

Hope this helps,
Matt

On Tue, Aug 18, 2020 at 12:32 PM Daniel Gerzhoy 
wrote:

> Matt,
>
> Thanks for the detailed response. Yeah that sounds pretty involved, I
> probably won't go down that path unless I see no other way.
>
> When you say the data cache do you mean make it a global memory array?
> This is actually what I already have, and I wanted to keep the "constant"
> data from getting evicted by other global memory data.
>
> How does the SQC work in terms of data rather than instructions? Could I
> have data go in the SQC?
>
> On that note, where does "Shared" memory reside?
>
> Thanks,
>
> Dan
>
> On Tue, Aug 18, 2020 at 12:41 PM Matt Sinclair via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> Hi Dan,
>>
>> Tony will have to confirm, but I believe AMD didn’t add support for
>> constant memory because none of the applications they looked at used it.
>> The mincore error is kind of a catch all, saying that something bad
>> happened and you went down a failure path.
>>
>> Assuming the above is correct, if you wanted to add support for constant
>> memory, you’d need to start by adding the appropriate syscall support.  I
>> suspect the reason you are hitting the mincore error is because your
>> program attempted to run an unimplemented syscall and didn’t know what to
>> do.  If you want to go down this route, I would suggest running with a
>> debug build of gem5 and using gdb to try and trace back where the mincore
>> failure is coming from, but from personal experience I can tell you this is
>> not always 100% effective.  Another other option would be to use gdb in the
>> application itself and step through it, seeing what ioctls the
>> hipMemcpyToSymbol is using under the hood.  Anyways, in gem5 you would also
>> need to instantiate a separate constant cache and connect that to the
>> existing memory hierarchy in the appropriate places.  So, as you can
>> probably tell, this will likely be a fairly intensive process to get
>> working though.
>>
>> The alternative would be to change your program to use the data cache for
>> the array instead of using the constant cache.  This would potentially hurt
>> the performance of the application, but wouldn’t require adding any new
>> features to the simulator.
>>
>> To answer your other questions more directly:
>>
>> - the constant memory allocations shouldn’t go to the scalar cache or
>> data cache.  It uses a separate cache, the constant cache.  If you look at
>> slides on GCN3 (e.g., slide 23 of:
>> https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf
>> or Figure 1.1 in
>> http://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf),
>> you’ll see a separate cache from the I$, D$, and scalar cache for constants.
>> - See slide 60:
>> http://www.m5sim.org/wiki/images/1/19/AMD_gem5_APU_simulator_isca_2018_gem5_wiki.pdf
>> for the SQC explanation.
>>
>> Thanks,
>> Matt
>>
>> On Tue, Aug 18, 2020 at 8:37 AM Daniel Gerzhoy via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hey all,
>>>
>>> Is there a way to use constant memory in the GPU Model right now?
>>>
>>> Using the
>>>
>>> *__constant__ float variable[SIZE];*
>>>
>>> and
>>>
>>> *hipMemcpyToSymbol(...)*
>>>
>>> results in a
>>>
>>> *fatal: syscall mincore (#27) unimplemented.*
>>>
>&

[gem5-users] Re: Missing L1 and L2 Hit stats/actions in MOESI AMD Base - CorePair.sm

2020-08-18 Thread Matt Sinclair via gem5-users
Hi Daniel,

If you don't mind, can you please post the patch(es) to develop?  It would
be great to have these included in the publicly available code.

Thanks,
Matt

On Tue, Aug 18, 2020 at 11:22 AM Sampad Mohapatra  wrote:

> Hey Daniel,
>
> Thanks for the patch. I am using the staging branch.
> If you have added any stats to the L3, can you please provide a patch for
> that as well ?
>
> Thanks again,
> Sampad
>
> On Tue, Aug 18, 2020 at 12:06 PM Daniel Gerzhoy 
> wrote:
>
>> Sampad,
>>
>> I thought the L3 would be inclusive too, but the code reads otherwise, it
>> only caches entries on a writeback (always) or writethrough (if enabled).
>>
>> I am considering changing it to be inclusive for my own purposes (I don't
>> know if that is something the community would want or not). Which I guess
>> would just entail caching on reads as well as WB/WT.
>>
>> I've attached the patch. I've run it on the gcn3-staging branch (+some of
>> my own changes) but I haven't run it on the develop branch (this patch is
>> on top of the develop branch for your convenience).
>>
>> You will notice some comments about certain hitsProfiling calls for
>> stores not being right.
>> From what i can tell when the corepair has ownership (M/O/E states) of a
>> block, when one of the cores stores to that block even if the block isn't
>> in that corepair's L1, it instantaneously puts that cache block into
>> Modified state for that core's L1.
>> I didn't want to fix the instantaneousness when I was working on this, so
>> since performance-wise it's a hit, I just profiled it as such.
>> You could split those transitions up for a storeMiss and a storeHit (they
>> could be the same save for the profiling action call)
>>
>> Let me know if this works for you.
>>
>> Cheers,
>>
>> Dan
>>
>> On Tue, Aug 18, 2020 at 11:09 AM Sampad Mohapatra  wrote:
>>
>>> Hi Daniel,
>>>
>>> I am just starting out so it would be really helpful if you could kindly
>>> provide your patches.
>>> Have you verified the changes, otherwise I will try to verify them.
>>>
>>> Also, isn't the L3 inclusive ?
>>>
>>> Thank you,
>>> Sampad
>>>
>>> On Tue, Aug 18, 2020 at 9:57 AM Daniel Gerzhoy 
>>> wrote:
>>>
>>>> Hi Sampad,
>>>>
>>>> I've added corepair profiling to MOESI_AMD_BASE-CorePair.sm if you
>>>> haven't already done so. I can create a patch for you (or I'd be happy to
>>>> review if you end up submitting one).
>>>>
>>>> I was confused about the L3Cache in the <...>-dir.sm file as well.
>>>> The MOESI_AMD_Base-L3cache.sm file doesn't actually do anything. The L3
>>>> Cache file is implemented alongside the directory.
>>>> It acts as a victim cache for the L2s.
>>>>
>>>> I also noticed that the MRU isn't being updated for the L3, so I added
>>>> thatexcept it's all commented out in my code right now and I'm not sure
>>>> why, so I am going to investigate that.
>>>>
>>>> Cheers,
>>>>
>>>> Dan Gerzhoy
>>>>
>>>> On Tue, Aug 18, 2020 at 12:04 AM Matt Sinclair via gem5-users <
>>>> gem5-users@gem5.org> wrote:
>>>>
>>>>> Hi Sampad,
>>>>>
>>>>> The AMD folks that are CC'd are better people to comment than me, but
>>>>> I believe the L3 cache is a "memory side" cache, and thus doesn't need to
>>>>> maintain coherence.
>>>>>
>>>>> If you have a fix to add hits/misses for any of these files, and are
>>>>> willing to contribute it back, it would be great if you could submit a
>>>>> patch for it.
>>>>>
>>>>> Matt
>>>>>
>>>>> On Mon, Aug 17, 2020 at 11:00 PM Sampad Mohapatra 
>>>>> wrote:
>>>>>
>>>>>> Hi Matt,
>>>>>>
>>>>>> I am currently trying to add the stats by myself. I also noticed that
>>>>>> neither hit nor miss stats are updated for L3 Cache.
>>>>>>
>>>>>> But, I am facing a different issue now. The directory ctrl (DirCntrl)
>>>>>> in GPU_VIPER.py has a L3 cache (L3Cache).
>>>>>> But, there is also a separate L3Cntrl class which inherits from
>>>>>> L3Cache_Controller, but it is unused.
>>>>>> The L3Cache_Controller is generated from MOE

[gem5-users] Re: GCN3/hip constant memory

2020-08-18 Thread Matt Sinclair via gem5-users
Hi Dan,

Tony will have to confirm, but I believe AMD didn’t add support for
constant memory because none of the applications they looked at used it.
The mincore error is kind of a catch all, saying that something bad
happened and you went down a failure path.

Assuming the above is correct, if you wanted to add support for constant
memory, you’d need to start by adding the appropriate syscall support.  I
suspect the reason you are hitting the mincore error is because your
program attempted to run an unimplemented syscall and didn’t know what to
do.  If you want to go down this route, I would suggest running with a
debug build of gem5 and using gdb to try and trace back where the mincore
failure is coming from, but from personal experience I can tell you this is
not always 100% effective.  Another other option would be to use gdb in the
application itself and step through it, seeing what ioctls the
hipMemcpyToSymbol is using under the hood.  Anyways, in gem5 you would also
need to instantiate a separate constant cache and connect that to the
existing memory hierarchy in the appropriate places.  So, as you can
probably tell, this will likely be a fairly intensive process to get
working though.

The alternative would be to change your program to use the data cache for
the array instead of using the constant cache.  This would potentially hurt
the performance of the application, but wouldn’t require adding any new
features to the simulator.

To answer your other questions more directly:

- the constant memory allocations shouldn’t go to the scalar cache or data
cache.  It uses a separate cache, the constant cache.  If you look at
slides on GCN3 (e.g., slide 23 of:
https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf
or Figure 1.1 in
http://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf),
you’ll see a separate cache from the I$, D$, and scalar cache for constants.
- See slide 60:
http://www.m5sim.org/wiki/images/1/19/AMD_gem5_APU_simulator_isca_2018_gem5_wiki.pdf
for the SQC explanation.

Thanks,
Matt

On Tue, Aug 18, 2020 at 8:37 AM Daniel Gerzhoy via gem5-users <
gem5-users@gem5.org> wrote:

> Hey all,
>
> Is there a way to use constant memory in the GPU Model right now?
>
> Using the
>
> *__constant__ float variable[SIZE];*
>
> and
>
> *hipMemcpyToSymbol(...)*
>
> results in a
>
> *fatal: syscall mincore (#27) unimplemented.*
>
> I've been looking through the code to find a way, but I haven't yet.
> I guess a clarifying question might be: which cache does constant memory
> go to? the SQC? Scalar Cache? (Those two actually seem to have the same
> controller)
>
> Thanks,
>
> Dan Gerzhoy
>
>
> ___
>
> gem5-users mailing list -- gem5-users@gem5.org
>
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Missing L1 and L2 Hit stats/actions in MOESI AMD Base - CorePair.sm

2020-08-17 Thread Matt Sinclair via gem5-users
Hi Sampad,

The AMD folks that are CC'd are better people to comment than me, but I
believe the L3 cache is a "memory side" cache, and thus doesn't need to
maintain coherence.

If you have a fix to add hits/misses for any of these files, and are
willing to contribute it back, it would be great if you could submit a
patch for it.

Matt

On Mon, Aug 17, 2020 at 11:00 PM Sampad Mohapatra  wrote:

> Hi Matt,
>
> I am currently trying to add the stats by myself. I also noticed that
> neither hit nor miss stats are updated for L3 Cache.
>
> But, I am facing a different issue now. The directory ctrl (DirCntrl) in
> GPU_VIPER.py has a L3 cache (L3Cache).
> But, there is also a separate L3Cntrl class which inherits from
> L3Cache_Controller, but it is unused.
> The L3Cache_Controller is generated from MOESI_AMD_Base-L3cache.sm, which
> is a part of the Viper protocol.
>
> If the L3Cache_Controller isn't used, then why is it a part of the Viper
> protocol ?
> Does the L3 Cache not maintain any coherency ?
> Is this the intended behaviour of the default configuration ?
>
> Thanks and Regards,
> Sampad
>
> On Thu, Aug 13, 2020 at 5:40 PM Matt Sinclair via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> Hi Sampad,
>>
>> I'm not aware of a patch for this.  There was recently a patch to add
>> similar support for the VIPER protocol:
>> https://gem5-review.googlesource.com/c/public/gem5/+/30174.  If the AMD
>> folks (CC'd) don't have a patch, then the next best thing would be to do
>> something similar to the VIPER patch (although yes, the Core Pair changes
>> would be a little more complicated).
>>
>> Thanks,
>> Matt
>>
>> On Tue, Aug 4, 2020 at 4:53 PM Sampad Mohapatra via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hello All,
>>>
>>> MOESI AMD Base - CorePair state machine is missing the actions for L1
>>> and L2 hit statistics.
>>> The stats are present, but since no "action" is created nor used
>>> (actions to update misses are present for both L1 and L2), the stats stay
>>> at 0.
>>>
>>> I am not clear as to which state transitions should
>>> update the hit stats. Is there a patch for this ?
>>>
>>> Thank You,
>>> Sampad Mohapatra
>>>
>>>
>>> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>>>  Virus-free.
>>> www.avg.com
>>> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>>> <#m_-8358212102347321869_m_-6648754351699623829_m_1723920294658796898_m_-8550086026140872479_m_-203124547492808951_m_-2486225155787835700_m_2217726199504402898_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>> ___
>>> gem5-users mailing list -- gem5-users@gem5.org
>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>> ___
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Missing L1 and L2 Hit stats/actions in MOESI AMD Base - CorePair.sm

2020-08-13 Thread Matt Sinclair via gem5-users
Hi Sampad,

I'm not aware of a patch for this.  There was recently a patch to add
similar support for the VIPER protocol:
https://gem5-review.googlesource.com/c/public/gem5/+/30174.  If the AMD
folks (CC'd) don't have a patch, then the next best thing would be to do
something similar to the VIPER patch (although yes, the Core Pair changes
would be a little more complicated).

Thanks,
Matt

On Tue, Aug 4, 2020 at 4:53 PM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hello All,
>
> MOESI AMD Base - CorePair state machine is missing the actions for L1 and
> L2 hit statistics.
> The stats are present, but since no "action" is created nor used (actions
> to update misses are present for both L1 and L2), the stats stay at 0.
>
> I am not clear as to which state transitions should
> update the hit stats. Is there a patch for this ?
>
> Thank You,
> Sampad Mohapatra
>
>
> 
>  Virus-free.
> www.avg.com
> 
> <#m_2217726199504402898_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: AMD GCN3 - Can't use single CPU - fatal no spare thread context

2020-08-08 Thread Matt Sinclair via gem5-users
I don't know the answer to this, and would need to look just the same as
you.

My guess is that a) the thread contexts do not necessarily require running
on different CPUs and b) after the check I described, the one thread won't
be doing anything more after that point, so its effect on subsequent
coherence, memory traffic, etc. would be 0.  In the experiments I was
running at the time, I believe I was only using a single CPU and a single
GPU, and was able to run multiple thread contexts, so I don't think
multiple CPUs is a problem.  You could probably even use the m5stats to
reset stats after this startup stuff is done, since I think the check that
requires the additional thread context happens at the very beginning when
ROCm is initialized.

These are all guesses though, and if you wanted absolute answers, you'd
have to use gdb as I explained previously.

Matt

On Sat, Aug 8, 2020 at 12:47 AM Sampad Mohapatra  wrote:

> Hi Matt,
>
> Thanks for the clarification.
>
> My issue is I need more than 1 cpu. In this scenario what will be the
> effect of this extra cpu
> on the coherence traffic, i.e. does it become part of a Core Pair and take
> part in coherence exchanges ?
> When I am placing cpus in a garnet topology, how do I ignore this
> particular cpu ?
>
> Regards,
> Sampad
>
>
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=icon>
>  Virus-free.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=link>
> <#m_8951653715473613158_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> On Sat, Aug 8, 2020 at 1:08 AM Matt Sinclair via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> Hi Sampad,
>>
>> To literally answer the clone error part: this happens when your
>> application needs multiple thread contexts to run.  The failure happens
>> when -n 1 is used because the simulator doesn't have enough thread contexts
>> to fulfill what the application needs.
>>
>> Of course, the next logical question is why ROCm needs 2 thread
>> contexts.  I haven't looked at this specific behavior in several years, but
>> when I dug into this in ~2018, I remember this happening because the ROCm
>> stack was spawning a thread to check on some details about the system
>> (e.g., it was checking if the HCC version was at least version X, because
>> starting with that version, the HCC behavior was different).  If you are
>> interested in finding the exact call that does this, you can build a debug
>> version of the ROCm stack and step through the ROCm stack with gdb while
>> the simulator is running.  Eventually you'll get to the instruction in the
>> ROCm stack that is doing checks like the one I described above, and you
>> could potentially remove that call and return true/false instead as
>> appropriate for the check it's doing.  This is what I did previously,
>> although I don't think that ROCm patch has been merged into develop or the
>> AMD staging branch yet (although like some of the other ROCm patches, it
>> would actually need to be placed elsewhere like gem5-resources, not
>> directly in the gem5 repo, since it doesn't affect gem5 code).
>>
>> Alternatively, you can just run with -n 2, as you've found already.  It
>> should have very minimal impact on running the application.
>>
>> Matt
>>
>> On Fri, Aug 7, 2020 at 11:46 PM Sampad Mohapatra via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hi All,
>>>
>>> Why does the GCN3 model require at least 2 CPUs ?
>>> Every time I use a single CPU, gem5 crashes with the following error:
>>> *fatal: clone: no spare thread context in system*
>>>
>>> In contrast, I was able to run the HSAIL model with a single CPU.
>>>
>>> Thank You,
>>> Sampad Mohapatra
>>>
>>>
>>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=icon>
>>>  Virus-free.
>>> www.avast.com
>>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=link>
>>> <#m_8951653715473613158_m_-8468723184678421553_m_4482362651241885149_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>> ___
>>> gem5-users mailing list -- gem5-users@gem5.org
>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>> ___
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: AMD GCN3 - Can't use single CPU - fatal no spare thread context

2020-08-07 Thread Matt Sinclair via gem5-users
Hi Sampad,

To literally answer the clone error part: this happens when your
application needs multiple thread contexts to run.  The failure happens
when -n 1 is used because the simulator doesn't have enough thread contexts
to fulfill what the application needs.

Of course, the next logical question is why ROCm needs 2 thread contexts.
I haven't looked at this specific behavior in several years, but when I dug
into this in ~2018, I remember this happening because the ROCm stack was
spawning a thread to check on some details about the system (e.g., it was
checking if the HCC version was at least version X, because starting with
that version, the HCC behavior was different).  If you are interested in
finding the exact call that does this, you can build a debug version of the
ROCm stack and step through the ROCm stack with gdb while the simulator is
running.  Eventually you'll get to the instruction in the ROCm stack that
is doing checks like the one I described above, and you could potentially
remove that call and return true/false instead as appropriate for the check
it's doing.  This is what I did previously, although I don't think that
ROCm patch has been merged into develop or the AMD staging branch yet
(although like some of the other ROCm patches, it would actually need to be
placed elsewhere like gem5-resources, not directly in the gem5 repo, since
it doesn't affect gem5 code).

Alternatively, you can just run with -n 2, as you've found already.  It
should have very minimal impact on running the application.

Matt

On Fri, Aug 7, 2020 at 11:46 PM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hi All,
>
> Why does the GCN3 model require at least 2 CPUs ?
> Every time I use a single CPU, gem5 crashes with the following error:
> *fatal: clone: no spare thread context in system*
>
> In contrast, I was able to run the HSAIL model with a single CPU.
>
> Thank You,
> Sampad Mohapatra
>
>
> 
>  Virus-free.
> www.avast.com
> 
> <#m_4482362651241885149_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: AMD GCN3 - Garnet Support

2020-07-29 Thread Matt Sinclair via gem5-users
Hi Sampad,

Srikant is the expert on Garnet, but his post to gem5-users is not working,
so I'm forwarding his reply.

Hope this helps,
Matt

--

GCN3 APU model supports Garnet topologies. You will have to enable garnet
using --network=garnet2.0 and use an appropriate topology file using the
--topology option.
The hsaTopology creates the required files for ROCm driver and is not
related to the network topology specified for garnet in gem5.

Thanks,
Srikant

On Tue, Jul 28, 2020 at 7:47 PM Sampad Mohapatra via gem5-users <
gem5-users@gem5.org> wrote:

> Hi All,
>
> Does the GCN3 APU model support garnet network and topologies ?
> Also, what is the hsaTopology ? Are garnet topologies and hsaTopology
> related in some way ?
>
> Thank you,
> Sampad Mohapatra
>
>
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: GCN3 GPU Simulation Start-Up Time

2020-06-22 Thread Matt Sinclair via gem5-users
In my opinion, adding support for HSA_PACKET_TYPE_AGENT_DISPATCH,
irrelevant of the current issues, is worthwhile and helpful to push.

If you have a minimum working example of how you change the benchmark, that
would be helpful too.

Kyle R. has spent a bunch of time trying to identify the source of the
problem within the synchronize call, but thus far we haven't found anything
concrete.  So for now, having this workaround would definitely be helpful
for the community.

Matt

On Mon, Jun 22, 2020 at 1:25 PM Daniel Gerzhoy 
wrote:

> Hey Matt,
>
> Happy to do that if you think it's viable, but I have to say my workaround
> is pretty hack-y. There are definitely some benchmark changes on top of
> changes to the simulator.
>
> Let me describe it for you and then if you still think it's a good idea
> I'll make a patch.
>
> My workaround relies on the fact that:
> 1. Launching a kernel sets up a completion signal that
> hipDeviceSynchronize() ultimately waits on.
> 2. All you need in the benchmark is that completion signal to know that
> your kernel is complete.
>
> So I basically implement the HSA_PACKET_TYPE_AGENT_DISPATCH in
> hsa_packet_processor.cc and gpu_command_processor.cc to receive commands
> from the benchmark directly.
> One of the commands is to steal the completion signal for a particular
> kernel and pass it back to the benchmark.
>
> After you launch the kernel (normally) you pass in that kernel's id (you
> have to keep track) then send a command to steal the completion signal. It
> gets passed back in the return_address member of the agent packet.
>
> In the benchmark I store that signal and use it to do a
> hsa_signal_wait_relaxed on it.
> And you have to do this every time you launch a kernel, you could
> conceivably overload the hipDeviceSynchronize() function to do this for
> you/keep track of kernel launches too.
>
> Let me know if you think this is still something you guys want.
>
> Cheers,
>
> Dan
>
>
> On Mon, Jun 22, 2020 at 2:04 PM Matt Sinclair 
> wrote:
>
>> Hi Dan,
>>
>> Do you mind pushing your workaround with the completion signal as a patch
>> to the staging branch so we can take a look?  Or is this just a change to
>> the program(s) itself?
>>
>> After Kyle's fix (which has been pushed as an update to my patch), we're
>> still seeing some hipDeviceSynchronize failures.  So we're interested in
>> looking at what you did to see if it solves the problem.
>>
>> Matt
>>
>> On Fri, Jun 19, 2020 at 4:15 PM Kyle Roarty  wrote:
>>
>>> Hi Dan,
>>>
>>> Another thing to try is to add and set the environment variable HIP_DB
>>> in apu_se.py (Line 461 for me, starts with "env = ['LD_LIBRARY_PATH...") .
>>> Setting HIP_DB=sync or HIP_DB=api has prevented crashing on
>>> hipDeviceSynchronize() calls for the applications I've tested.
>>>
>>> I had traced through this issue (or at least one that manifests the same
>>> way) a while back, and what I remember is that the crash happens somewhere
>>> in the HIP code, and it occurs because somewhere much earlier we go down a
>>> codepath that doesn't clear a register (I believe that was also in HIP
>>> code). That register then gets re-used until the error propagates to the
>>> register used in the ld instruction. Unfortunately, I had a hard time of
>>> getting consistent, manageable traces, so I wasn't able to figure out why
>>> we were going down the wrong codepath.
>>>
>>> Kyle
>>> --
>>> *From:* mattdsincl...@gmail.com 
>>> *Sent:* Friday, June 19, 2020 2:08 PM
>>> *To:* Daniel Gerzhoy 
>>> *Cc:* GAURAV JAIN ; Kyle Roarty ;
>>> gem5 users mailing list 
>>> *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time
>>>
>>> Thanks Dan.  Kyle R. has found some things about the patch that we're
>>> testing and may need to be pushed pending those results.  Fingers crossed
>>> that fix will help you too.
>>>
>>> As Gaurav mentioned previously, the spin flag did not always solve the
>>> problem for us -- seems like that is true for you too, although I don't
>>> remember square ever failing for us.
>>>
>>> I don't know exactly where that PC is coming from, I'd have to get a
>>> trace.  But I suspect it's actually a GPU address being accessed by some
>>> instruction that's failing -- in the past when I've seen this kind of
>>> issue, it was happening because the kernel boundary was not being respected
>>> and code was running that shouldn't have been running yet.  I don't know
>>> what your use case is, so it's possible that is not the issue for you -- a
>>> trace would be the only way to know for sure.
>>>
>>> Matt
>>>
>>> Regards,
>>> Matt Sinclair
>>> Assistant Professor
>>> University of Wisconsin-Madison
>>> Computer Sciences Department
>>> cs.wisc.edu/~sinclair
>>>
>>> On Wed, Jun 17, 2020 at 10:30 AM Daniel Gerzhoy <
>>> daniel.gerz...@gmail.com> wrote:
>>>
>>> Hey Matt,
>>>
>>> Thanks for pushing those changes. I updated the head of the amd staging
>>> branch and tried to run square. The time to get into main stays 

[gem5-users] Re: GCN3 GPU Simulation Start-Up Time

2020-06-22 Thread Matt Sinclair via gem5-users
Hi Dan,

Do you mind pushing your workaround with the completion signal as a patch
to the staging branch so we can take a look?  Or is this just a change to
the program(s) itself?

After Kyle's fix (which has been pushed as an update to my patch), we're
still seeing some hipDeviceSynchronize failures.  So we're interested in
looking at what you did to see if it solves the problem.

Matt

On Fri, Jun 19, 2020 at 4:15 PM Kyle Roarty  wrote:

> Hi Dan,
>
> Another thing to try is to add and set the environment variable HIP_DB in
> apu_se.py (Line 461 for me, starts with "env = ['LD_LIBRARY_PATH...") .
> Setting HIP_DB=sync or HIP_DB=api has prevented crashing on
> hipDeviceSynchronize() calls for the applications I've tested.
>
> I had traced through this issue (or at least one that manifests the same
> way) a while back, and what I remember is that the crash happens somewhere
> in the HIP code, and it occurs because somewhere much earlier we go down a
> codepath that doesn't clear a register (I believe that was also in HIP
> code). That register then gets re-used until the error propagates to the
> register used in the ld instruction. Unfortunately, I had a hard time of
> getting consistent, manageable traces, so I wasn't able to figure out why
> we were going down the wrong codepath.
>
> Kyle
> --
> *From:* mattdsincl...@gmail.com 
> *Sent:* Friday, June 19, 2020 2:08 PM
> *To:* Daniel Gerzhoy 
> *Cc:* GAURAV JAIN ; Kyle Roarty ; gem5
> users mailing list 
> *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time
>
> Thanks Dan.  Kyle R. has found some things about the patch that we're
> testing and may need to be pushed pending those results.  Fingers crossed
> that fix will help you too.
>
> As Gaurav mentioned previously, the spin flag did not always solve the
> problem for us -- seems like that is true for you too, although I don't
> remember square ever failing for us.
>
> I don't know exactly where that PC is coming from, I'd have to get a
> trace.  But I suspect it's actually a GPU address being accessed by some
> instruction that's failing -- in the past when I've seen this kind of
> issue, it was happening because the kernel boundary was not being respected
> and code was running that shouldn't have been running yet.  I don't know
> what your use case is, so it's possible that is not the issue for you -- a
> trace would be the only way to know for sure.
>
> Matt
>
> Regards,
> Matt Sinclair
> Assistant Professor
> University of Wisconsin-Madison
> Computer Sciences Department
> cs.wisc.edu/~sinclair
>
> On Wed, Jun 17, 2020 at 10:30 AM Daniel Gerzhoy 
> wrote:
>
> Hey Matt,
>
> Thanks for pushing those changes. I updated the head of the amd staging
> branch and tried to run square. The time to get into main stays about the
> same (5min) FYI.
>
> But the hipDeviceSynchronize() fails even when I add 
> hipSetDeviceFlags(hipDeviceScheduleSpin);
> unfortunately.
>
>  panic: Tried to read unmapped address 0x1853e78.
> PC: 0x752a966b, Instr:   MOV_R_M : ld   rdi, DS:[rbx + 0x8]
>
> Is that PC (0x752a966b( somewhere in the hip code or something?
> r in the emulated driver? The line between the simulator code and
> guest code is kind of blurry to me around there haha.
>
> Best,
>
> Dan
>
> On Mon, Jun 15, 2020 at 9:59 PM Matt Sinclair 
> wrote:
>
> Hi Dan,
>
> Thanks for the update.  Apologies for the delay, the patch didn't apply
> cleanly initially, but I have pushed the patch I promised previously.
> Since I'm not sure if you're on the develop branch or the AMD staging
> branch, I pushed it to both (there are some differences in code on the
> branches, which I hope will be resolved over time as more of the commits
> from the staging branch are pushed to develop:
>
> - develop: https://gem5-review.googlesource.com/c/public/gem5/+/30354
> - AMD staging: https://gem5-review.googlesource.com/c/amd/gem5/+/30335
>
> I have validated that both of them compile, and asked Kyle R to test that
> both of them a) don't break anything that is expected to work publicly with
> the GPU and b) hopefully resolve some of the problems (like yours) with
> barrier synchronization.  Let us know if this solves your problem too --
> fingers crossed.
>
> Thanks,
> Matt
>
> On Fri, Jun 12, 2020 at 2:47 PM Daniel Gerzhoy 
> wrote:
>
>   Matt,
>
> It wasn't so much a solution as an explanation. Kyle was running on an r5
> 3600 (3.6-4.2 GHz) whereas I am on a Xeon Gold 5117 @ (2.0 - 2.8 GHz)
>
> The relative difference in clock speed seems to me to be a more reasonable
> explanation for a slowdown from 1-1.5 minutes to ~5min (actual time before
> min) than the 8 min (time before main + exit time) I was seeing before.
>
> I'll update to the latest branch and see if that speeds me up further. I'm
> also going to try running on a faster machine as well though that will take
> some setup-time.
>
> Gaurav,
>
> Thanks for the tip, that will be helpful in the meantime.
>
> Dan
>
> On Fri, 

[gem5-users] Re: GCN3 GPU Simulation Start-Up Time

2020-06-19 Thread Matt Sinclair via gem5-users
Thanks Dan.  Kyle R. has found some things about the patch that we're
testing and may need to be pushed pending those results.  Fingers crossed
that fix will help you too.

As Gaurav mentioned previously, the spin flag did not always solve the
problem for us -- seems like that is true for you too, although I don't
remember square ever failing for us.

I don't know exactly where that PC is coming from, I'd have to get a
trace.  But I suspect it's actually a GPU address being accessed by some
instruction that's failing -- in the past when I've seen this kind of
issue, it was happening because the kernel boundary was not being respected
and code was running that shouldn't have been running yet.  I don't know
what your use case is, so it's possible that is not the issue for you -- a
trace would be the only way to know for sure.

Matt

Regards,
Matt Sinclair
Assistant Professor
University of Wisconsin-Madison
Computer Sciences Department
cs.wisc.edu/~sinclair

On Wed, Jun 17, 2020 at 10:30 AM Daniel Gerzhoy 
wrote:

> Hey Matt,
>
> Thanks for pushing those changes. I updated the head of the amd staging
> branch and tried to run square. The time to get into main stays about the
> same (5min) FYI.
>
> But the hipDeviceSynchronize() fails even when I add 
> hipSetDeviceFlags(hipDeviceScheduleSpin);
> unfortunately.
>
>  panic: Tried to read unmapped address 0x1853e78.
> PC: 0x752a966b, Instr:   MOV_R_M : ld   rdi, DS:[rbx + 0x8]
>
> Is that PC (0x752a966b( somewhere in the hip code or something?
> r in the emulated driver? The line between the simulator code and
> guest code is kind of blurry to me around there haha.
>
> Best,
>
> Dan
>
> On Mon, Jun 15, 2020 at 9:59 PM Matt Sinclair 
> wrote:
>
>> Hi Dan,
>>
>> Thanks for the update.  Apologies for the delay, the patch didn't apply
>> cleanly initially, but I have pushed the patch I promised previously.
>> Since I'm not sure if you're on the develop branch or the AMD staging
>> branch, I pushed it to both (there are some differences in code on the
>> branches, which I hope will be resolved over time as more of the commits
>> from the staging branch are pushed to develop:
>>
>> - develop: https://gem5-review.googlesource.com/c/public/gem5/+/30354
>> - AMD staging: https://gem5-review.googlesource.com/c/amd/gem5/+/30335
>>
>> I have validated that both of them compile, and asked Kyle R to test that
>> both of them a) don't break anything that is expected to work publicly with
>> the GPU and b) hopefully resolve some of the problems (like yours) with
>> barrier synchronization.  Let us know if this solves your problem too --
>> fingers crossed.
>>
>> Thanks,
>> Matt
>>
>> On Fri, Jun 12, 2020 at 2:47 PM Daniel Gerzhoy 
>> wrote:
>>
>>>   Matt,
>>>
>>> It wasn't so much a solution as an explanation. Kyle was running on an
>>> r5 3600 (3.6-4.2 GHz) whereas I am on a Xeon Gold 5117 @ (2.0 - 2.8 GHz)
>>>
>>> The relative difference in clock speed seems to me to be a more
>>> reasonable explanation for a slowdown from 1-1.5 minutes to ~5min (actual
>>> time before min) than the 8 min (time before main + exit time) I was seeing
>>> before.
>>>
>>> I'll update to the latest branch and see if that speeds me up further.
>>> I'm also going to try running on a faster machine as well though that will
>>> take some setup-time.
>>>
>>> Gaurav,
>>>
>>> Thanks for the tip, that will be helpful in the meantime.
>>>
>>> Dan
>>>
>>> On Fri, Jun 12, 2020 at 3:41 PM GAURAV JAIN  wrote:
>>>
 Hi,

 I am not sure if chiming in now would cause any more confusion, but
 still giving it a try.

 @Daniel Gerzhoy  - for hipDeviceSynchronize,
 as Matt mentioned, they are working on a fix and should have it out there.
 If you want to, can you try this:

 hipSetDeviceFlags(hipDeviceScheduleSpin);
 for (int k = 1; k < dim; k++) {
 hipLaunchKernelGGL(HIP_KERNEL_NAME(somekernel), grid, threads,
 0, 0);
 hipDeviceSynchronize();
 }

 For me, in many cases (not all and in the ones which it didn't work, I
 got the same error unmapped error as you), this seemed like doing the
 trick. You should checkout the HEAD and then try this. I am not hoping for
 it to make any difference but still worth a shot.


 --
 *From:* mattdsincl...@gmail.com 
 *Sent:* Friday, June 12, 2020 2:14 PM
 *To:* Daniel Gerzhoy 
 *Cc:* Kyle Roarty ; GAURAV JAIN ;
 gem5 users mailing list 
 *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time

 Hi Dan,

 Glad to hear things are working, and thanks for the tips!  I must admit
 to not quite following what the solution was though -- are you saying the
 solution is to replace exit(0)/return with m5_exit()?  I thought your
 original post said the problem was things taking a really long time before
 main?  If so, it would seem like something else must have been the

[gem5-users] Re: GCN3 GPU Simulation Start-Up Time

2020-06-15 Thread Matt Sinclair via gem5-users
Hi Dan,

Thanks for the update.  Apologies for the delay, the patch didn't apply
cleanly initially, but I have pushed the patch I promised previously.
Since I'm not sure if you're on the develop branch or the AMD staging
branch, I pushed it to both (there are some differences in code on the
branches, which I hope will be resolved over time as more of the commits
from the staging branch are pushed to develop:

- develop: https://gem5-review.googlesource.com/c/public/gem5/+/30354
- AMD staging: https://gem5-review.googlesource.com/c/amd/gem5/+/30335

I have validated that both of them compile, and asked Kyle R to test that
both of them a) don't break anything that is expected to work publicly with
the GPU and b) hopefully resolve some of the problems (like yours) with
barrier synchronization.  Let us know if this solves your problem too --
fingers crossed.

Thanks,
Matt

On Fri, Jun 12, 2020 at 2:47 PM Daniel Gerzhoy 
wrote:

>   Matt,
>
> It wasn't so much a solution as an explanation. Kyle was running on an r5
> 3600 (3.6-4.2 GHz) whereas I am on a Xeon Gold 5117 @ (2.0 - 2.8 GHz)
>
> The relative difference in clock speed seems to me to be a more reasonable
> explanation for a slowdown from 1-1.5 minutes to ~5min (actual time before
> min) than the 8 min (time before main + exit time) I was seeing before.
>
> I'll update to the latest branch and see if that speeds me up further. I'm
> also going to try running on a faster machine as well though that will take
> some setup-time.
>
> Gaurav,
>
> Thanks for the tip, that will be helpful in the meantime.
>
> Dan
>
> On Fri, Jun 12, 2020 at 3:41 PM GAURAV JAIN  wrote:
>
>> Hi,
>>
>> I am not sure if chiming in now would cause any more confusion, but still
>> giving it a try.
>>
>> @Daniel Gerzhoy  - for hipDeviceSynchronize,
>> as Matt mentioned, they are working on a fix and should have it out there.
>> If you want to, can you try this:
>>
>> hipSetDeviceFlags(hipDeviceScheduleSpin);
>> for (int k = 1; k < dim; k++) {
>> hipLaunchKernelGGL(HIP_KERNEL_NAME(somekernel), grid, threads, 0,
>> 0);
>> hipDeviceSynchronize();
>> }
>>
>> For me, in many cases (not all and in the ones which it didn't work, I
>> got the same error unmapped error as you), this seemed like doing the
>> trick. You should checkout the HEAD and then try this. I am not hoping for
>> it to make any difference but still worth a shot.
>>
>>
>> --
>> *From:* mattdsincl...@gmail.com 
>> *Sent:* Friday, June 12, 2020 2:14 PM
>> *To:* Daniel Gerzhoy 
>> *Cc:* Kyle Roarty ; GAURAV JAIN ;
>> gem5 users mailing list 
>> *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time
>>
>> Hi Dan,
>>
>> Glad to hear things are working, and thanks for the tips!  I must admit
>> to not quite following what the solution was though -- are you saying the
>> solution is to replace exit(0)/return with m5_exit()?  I thought your
>> original post said the problem was things taking a really long time before
>> main?  If so, it would seem like something else must have been the
>> problem/solution?
>>
>> Coming to your other questions: I don't recall what exactly the root
>> cause of the hipDeviceSynchronize failure is, but I would definitely
>> recommend updating to the current staging branch head first and testing.  I
>> am also hoping to push a fix today to the barrier bit synchronization --
>> most of the hipDeviceSynchronize-type failures I've seen were due to a bug
>> in my barrier bit implementation.  I'm not sure if this will be the
>> solution to your problem or not, but I can definitely add you as a reviewer
>> and/or point you to it if needed.
>>
>> Not sure about the m5op, hopefully someone else can chime in on that.
>>
>> Thanks,
>> Matt
>>
>> On Fri, Jun 12, 2020 at 12:12 PM Daniel Gerzhoy 
>> wrote:
>>
>> I've figured it out.
>>
>> To measure the time it took to get to main() I put a *return 0; *at the
>> beginning of the function so I wouldn't have to babysit it.
>>
>> I didn't consider that it would also take some time for the simulator to
>> exit, which is where the extra few minutes comes from.
>> Side-note: *m5_exit(0);* instead of a return exits immediately.
>>
>> 5 min is a bit more reasonable of a slowdown for the difference between
>> the two clocks.
>>
>> Two incidental things:
>>
>> 1. Is there a way to have gem5 spit out (real wall-clock) timestamps
>> while it's printing stuff?
>> 2. A while ago I asked about hipDeviceSynchronize(); causing crashes
>> (panic: Tried to read unmapped address 0xffc29f48.). Has this been
>> fixed since?
>>
>> I'm going to update to the head of this branch soon, and eventually to
>> the main branch. If it hasn't been fixed I've created a workaround by
>> stealing the completion signal of the kernel based on its launch id, and
>> manually waiting for it using the HSA interface.
>> Happy to help out and implement this as a m5op (or something) if that
>> would be helpful for you guys.
>>
>> Best,
>>
>> 

[gem5-users] Re: GCN3 GPU Simulation Start-Up Time

2020-06-12 Thread Matt Sinclair via gem5-users
Hi Dan,

Glad to hear things are working, and thanks for the tips!  I must admit to
not quite following what the solution was though -- are you saying the
solution is to replace exit(0)/return with m5_exit()?  I thought your
original post said the problem was things taking a really long time before
main?  If so, it would seem like something else must have been the
problem/solution?

Coming to your other questions: I don't recall what exactly the root cause
of the hipDeviceSynchronize failure is, but I would definitely recommend
updating to the current staging branch head first and testing.  I am also
hoping to push a fix today to the barrier bit synchronization -- most of
the hipDeviceSynchronize-type failures I've seen were due to a bug in my
barrier bit implementation.  I'm not sure if this will be the solution to
your problem or not, but I can definitely add you as a reviewer and/or
point you to it if needed.

Not sure about the m5op, hopefully someone else can chime in on that.

Thanks,
Matt

On Fri, Jun 12, 2020 at 12:12 PM Daniel Gerzhoy 
wrote:

> I've figured it out.
>
> To measure the time it took to get to main() I put a *return 0; *at the
> beginning of the function so I wouldn't have to babysit it.
>
> I didn't consider that it would also take some time for the simulator to
> exit, which is where the extra few minutes comes from.
> Side-note: *m5_exit(0);* instead of a return exits immediately.
>
> 5 min is a bit more reasonable of a slowdown for the difference between
> the two clocks.
>
> Two incidental things:
>
> 1. Is there a way to have gem5 spit out (real wall-clock) timestamps while
> it's printing stuff?
> 2. A while ago I asked about hipDeviceSynchronize(); causing crashes
> (panic: Tried to read unmapped address 0xffc29f48.). Has this been
> fixed since?
>
> I'm going to update to the head of this branch soon, and eventually to the
> main branch. If it hasn't been fixed I've created a workaround by stealing
> the completion signal of the kernel based on its launch id, and manually
> waiting for it using the HSA interface.
> Happy to help out and implement this as a m5op (or something) if that
> would be helpful for you guys.
>
> Best,
>
> Dan
>
> On Thu, Jun 11, 2020 at 12:40 PM Matt Sinclair 
> wrote:
>
>> I don't see anything amazingly amiss in your output, but the number of
>> times the open/etc. fail is interesting -- Kyle do we see the same thing?
>> If not, it could be that you should update your apu_se.py to point to the
>> "correct" place to search for the libraries first?
>>
>> Also, based on Kyle's reply, Dan how long does it take you to boot up
>> square?  Certainly a slower machine might take longer, but it does seem
>> even slower than expected.  But if we're trying the same application, maybe
>> it will be easier to spot differences.
>>
>> I would also recommend updating to the latest commit on the staging
>> branch -- I don't believe it should break anything with those patches.
>>
>> Yes, looks like you are using the release version of ROCm -- no issues
>> there.
>>
>> Matt
>>
>>
>>
>> On Thu, Jun 11, 2020 at 9:38 AM Daniel Gerzhoy 
>> wrote:
>>
>>> I am using the docker, yeah.
>>> It's running on our server cluster which is a Xeon Gold 5117 @ (2.0 -
>>> 2.8 GHz) which might make up some of the difference, the r5 3600 has a
>>> faster clock (3.6-4.2 GHz).
>>>
>>> I've hesitated to update my branch because in the Dockerfile it
>>> specifically checks this branch out and applies a patch, though the patch
>>> isn't very extensive.
>>> This was from a while back (November maybe?) and I know you guys have
>>> been integrating things into the main branch (thanks!)
>>> I was thinking I would wait until it's fully merged into the mainline
>>> gem5 branch and rebase onto that and try to merge my changes in.
>>>
>>> Last I checked the GCN3 stuff is in the dev branch not the master right?
>>>
>>> But if it will help maybe I should update to the head of this branch.
>>> Will I need to update the docker as well?
>>>
>>> As for the debug vs release rocm I think I'm using the release version.
>>> This is what the dockerfile built:
>>>
>>> ARG rocm_ver=1.6.2
>>> RUN wget -qO- repo.radeon.com/rocm/archive/apt_${rocm_ver}.tar.bz2
>>>  \
>>> | tar -xjv \
>>> && cd apt_${rocm_ver}/pool/main/ \
>>> && dpkg -i h/hsakmt-roct-dev/* \
>>> && dpkg -i h/hsa-ext-rocr-dev/* \
>>> && dpkg -i h/hsa-rocr-dev/* \
>>> && dpkg -i r/rocm-utils/* \
>>> && dpkg -i h/hcc/* \
>>> && dpkg -i h/hip_base/* \
>>> && dpkg -i h/hip_hcc/* \
>>> && dpkg -i h/hip_samples/*
>>>
>>>
>>> I ran a benchmark that prints that it entered main and returns
>>> immediately, this took 9 minutes.
>>> I've attached a debug trace with debug flags =
>>> "GPUDriver,SyscallVerbose"
>>> There's a lot of weird things going on, "syscall open: failed", "syscall
>>> brk: break point changed to [...]", and lots of ignored system 

[gem5-users] Re: GCN3 GPU Simulation Start-Up Time

2020-06-11 Thread Matt Sinclair via gem5-users
I don't see anything amazingly amiss in your output, but the number of
times the open/etc. fail is interesting -- Kyle do we see the same thing?
If not, it could be that you should update your apu_se.py to point to the
"correct" place to search for the libraries first?

Also, based on Kyle's reply, Dan how long does it take you to boot up
square?  Certainly a slower machine might take longer, but it does seem
even slower than expected.  But if we're trying the same application, maybe
it will be easier to spot differences.

I would also recommend updating to the latest commit on the staging branch
-- I don't believe it should break anything with those patches.

Yes, looks like you are using the release version of ROCm -- no issues
there.

Matt



On Thu, Jun 11, 2020 at 9:38 AM Daniel Gerzhoy 
wrote:

> I am using the docker, yeah.
> It's running on our server cluster which is a Xeon Gold 5117 @ (2.0 - 2.8
> GHz) which might make up some of the difference, the r5 3600 has a faster
> clock (3.6-4.2 GHz).
>
> I've hesitated to update my branch because in the Dockerfile it
> specifically checks this branch out and applies a patch, though the patch
> isn't very extensive.
> This was from a while back (November maybe?) and I know you guys have been
> integrating things into the main branch (thanks!)
> I was thinking I would wait until it's fully merged into the mainline gem5
> branch and rebase onto that and try to merge my changes in.
>
> Last I checked the GCN3 stuff is in the dev branch not the master right?
>
> But if it will help maybe I should update to the head of this branch. Will
> I need to update the docker as well?
>
> As for the debug vs release rocm I think I'm using the release version.
> This is what the dockerfile built:
>
> ARG rocm_ver=1.6.2
> RUN wget -qO- repo.radeon.com/rocm/archive/apt_${rocm_ver}.tar.bz2
>  \
> | tar -xjv \
> && cd apt_${rocm_ver}/pool/main/ \
> && dpkg -i h/hsakmt-roct-dev/* \
> && dpkg -i h/hsa-ext-rocr-dev/* \
> && dpkg -i h/hsa-rocr-dev/* \
> && dpkg -i r/rocm-utils/* \
> && dpkg -i h/hcc/* \
> && dpkg -i h/hip_base/* \
> && dpkg -i h/hip_hcc/* \
> && dpkg -i h/hip_samples/*
>
>
> I ran a benchmark that prints that it entered main and returns
> immediately, this took 9 minutes.
> I've attached a debug trace with debug flags = "GPUDriver,SyscallVerbose"
> There's a lot of weird things going on, "syscall open: failed", "syscall
> brk: break point changed to [...]", and lots of ignored system calls.
>
> head of Stats for reference:
> -- Begin Simulation Statistics --
> sim_seconds  0.096192
>   # Number of seconds simulated
> sim_ticks 96192368500
>   # Number of ticks simulated
> final_tick96192368500
>   # Number of ticks from beginning of simulation (restored from checkpoints
> and never reset)
> sim_freq 1
>   # Frequency of simulated ticks
> host_inst_rate 175209
>   # Simulator instruction rate (inst/s)
> host_op_rate   338409
>   # Simulator op (including micro ops) rate (op/s)
> host_tick_rate  175362515
>   # Simulator tick rate (ticks/s)
> host_mem_usage1628608
>   # Number of bytes of host memory used
> host_seconds   548.53
>   # Real time elapsed on the host
> sim_insts96108256
>   # Number of instructions simulated
> sim_ops 185628785
>   # Number of ops (including micro ops) simulated
> system.voltage_domain.voltage   1
>   # Voltage in Volts
> system.clk_domain.clock  1000
>   # Clock period in ticks
>
> Maybe something in the attached file explains it better than I can express.
>
> Many thanks for your help and hard work!
>
> Dan
>
>
>
>
>
> On Thu, Jun 11, 2020 at 3:32 AM Kyle Roarty  wrote:
>
>> Running through a few applications, it took me about 2.5 minutes or less
>> each time using docker to start executing the program on an r5 3600.
>>
>> I ran square, dynamic_shared, and MatrixTranspose (All from HIP) which
>> took about 1-1.5 mins.
>>
>> I ran conv_bench and rnn_bench from DeepBench which took just about 2
>> minutes.
>>
>> Because of that, it's possible the size of the app has an effect on setup
>> time, as the HIP apps are extremely small.
>>
>> Also, the commit Dan is checked out on is d0945dc
>> 
>>  mem-ruby:
>> add cache hit/miss statistics for TCP and TCC
>> ,
>> which isn't the most recent commit. I don't believe that that would account
>> for such a 

[gem5-users] Re: GCN3 GPU Simulation Start-Up Time

2020-06-11 Thread Matt Sinclair via gem5-users
Gaurav & Kyle, do you know if this is the case?

Dan, I believe the short answer is yes although 7-8 minutes seems a little
long.  Are you running this in Kyle's Docker, or separately?  If in the
Docker, that does increase the overhead somewhat, so running it directly on
a system would likely reduce the overhead somewhat.  Also, are you running
with the release or debug version of the ROCm drivers?  Again, debug
version will likely add some time to this.

Matt

On Wed, Jun 10, 2020 at 2:00 PM Daniel Gerzhoy via gem5-users <
gem5-users@gem5.org> wrote:

> I've been running simulations using the GCN3 branch:
>
> rocm_ver=1.6.2
> $git branch
>* (HEAD detached at d0945dc)
>   agutierr/master-gcn3-staging
>
> And I've noticed that it takes roughly 7-8 minutes to get to main()
>
> I'm guessing that this is the simulator setting up drivers?
> Is that correct? Is there other stuff going on?
>
> *Has anyone found a way to speed this up? *
>
> I am trying to get some of the rodinia benchmarks from the HIP-Examples
> running and debugging takes a long time as a result.
>
> I suspect that this is unavoidable but I won't know if I don't ask!
>
> Cheers,
>
> Dan Gerzhoy
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Stat dump after each N instructions - CPU and/or AMD GPU

2020-05-26 Thread Matt Sinclair via gem5-users
Thanks, this is helpful Rajeev.  I am not an expert at this part of the
simulator, but I believe the m5ops is indeed what you are looking for.
After looking through the m5ops again, I don't think gem5 currently has
exactly the feature you are looking for -- m5_dump_reset_stats is the
closest, but it resets based on time, and you want instructions (as you and
Muhammet discussed above).  So, my guess is that you should look at how
m5_dump_reset_stats is implemented (start with include/gem5/m5ops.h), and
create a new version of it that dumps/resets based on instructions instead.

Matt

On Tue, May 26, 2020 at 4:40 PM Rajeev Pal  wrote:

> schedStatEvent() would be helpful in case of tick based stat dumping.
> But I need instruction count based stat dumping.
>
> On Tue, May 26, 2020 at 5:28 PM Muhammet Abdullah Soytürk <
> muhammetabdullahsoyt...@gmail.com> wrote:
>
>> I don't know how legal is this and whether it has side effects but you
>> might try schedStatEvent
>> <https://github.com/gem5/gem5/blob/master/src/sim/stat_control.cc#L248>
>> to schedule the dumps.
>>
>> Muhammet
>>
>> Matt Sinclair via gem5-users , 27 May 2020 Çar,
>> 00:13 tarihinde şunu yazdı:
>>
>>> I'm not sure if this is your ultimate problem, but if it only works on
>>> the CPU for the first N instructions, is N simply representing the point
>>> where you need a 64-bit counter instead of a 32-bit counter?
>>>
>>> Unfortunately I don't know the answer to your other questions, sorry.
>>> Perhaps you are thinking of the m5ops, which people often use for things
>>> like resetting stats:
>>> https://www.gem5.org/documentation/general_docs/m5ops/?
>>>
>>> Matt
>>>
>>> On Tue, May 26, 2020 at 2:12 PM Rajeev Pal via gem5-users <
>>> gem5-users@gem5.org> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Is it possible to dump and reset statistics after each N instructions
>>>> for CPU and/or AMD GPU.
>>>> I see that there is a *max_insts_any_thread* var for cpus. I was able
>>>> to use it to stop simulation, dump and reset stats (from apu_se.py), but it
>>>> only works for the *first* N instructions.
>>>>
>>>> (1) Is there any existing mechanism which I can leverage ?
>>>>  There is a comInstEventQueue which I think is used to stop the
>>>> simulation after first N instructions. Can I somehow use this ?
>>>>
>>>> (2) If not, then where and what sort of modifications will I need ? I
>>>> need to do this for both CPU and AMD GPU.
>>>>
>>>> Thank you,
>>>> Rajeev Pal
>>>> ___
>>>> gem5-users mailing list -- gem5-users@gem5.org
>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>>
>>> ___
>>> gem5-users mailing list -- gem5-users@gem5.org
>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Stat dump after each N instructions - CPU and/or AMD GPU

2020-05-26 Thread Matt Sinclair via gem5-users
I'm not sure if this is your ultimate problem, but if it only works on the
CPU for the first N instructions, is N simply representing the point where
you need a 64-bit counter instead of a 32-bit counter?

Unfortunately I don't know the answer to your other questions, sorry.
Perhaps you are thinking of the m5ops, which people often use for things
like resetting stats: https://www.gem5.org/documentation/general_docs/m5ops/
?

Matt

On Tue, May 26, 2020 at 2:12 PM Rajeev Pal via gem5-users <
gem5-users@gem5.org> wrote:

> Hi All,
>
> Is it possible to dump and reset statistics after each N instructions for
> CPU and/or AMD GPU.
> I see that there is a *max_insts_any_thread* var for cpus. I was able to
> use it to stop simulation, dump and reset stats (from apu_se.py), but it
> only works for the *first* N instructions.
>
> (1) Is there any existing mechanism which I can leverage ?
>  There is a comInstEventQueue which I think is used to stop the
> simulation after first N instructions. Can I somehow use this ?
>
> (2) If not, then where and what sort of modifications will I need ? I need
> to do this for both CPU and AMD GPU.
>
> Thank you,
> Rajeev Pal
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s