Re: Limit gpu max clock for ryzen 2400g

2019-03-26 Thread Lauri Ehrenpreis
Hi!

Thanx for helping, but I have tried those methods and they didn't work on
Ryzen 2400G.

--
Lauri

On Tue, Mar 26, 2019 at 2:57 PM Russell, Kent  wrote:

> Hi Lauri,
>
>
>
> There’s a more efficient method using the Power Profiles (and optionally,
> the ROCM-SMI tool, found at https://github.com/RadeonOpenCompute/ROC-smi),
> or the pp_sclk mask, depending on what exactly you want. I’ll list out the
> methods here and the rocm-smi and non-SMI commands to do it. I’ll assume
> that this GPU is card0 (it may be card1, card2, etc, depending on what GPUs
> are installed on your system; “rocm-smi -i” or “cat
> /sys/class/drm/card?/device/device will give you the GPU IDs of all of the
> cards, then you can figure out which one you want to use)
>
>
>
>1. Mask the SCLKs . pp_dpm_sclk allows you to set a mask of what
>levels to use.
>   1. First, read the values (“rocm-smi --showclkfrq” , or “cat
>   /sys/class/drm/card0/device/pp_dpm_sclk”) and see the supported DPM 
> levels
>   for your card.
>   2. Mask off the levels that you don’t want. E.g. If you only want
>   to use levels 0-6 (and thus skip level 7), you can do either ‘rocm-smi
>   --setsclk 0 1 2 3 4 5 6’ or ‘echo manual >
>   /sys/class/drm/card0/device/power_dpm_force_performance_level && echo 
> “0 1
>   2 3 4 5 6” > /sys/class/drm/card0/device/pp_dpm_sclk’ . This will set 
> DPM
>   to only use levels 0-6 and skip level 7. You can do this for any
>   combination of levels or a single level (“0 2 5”, “1 2 7”, “5”, etc). 
> That
>   will tell it to only use the specified DPM levels and will persist until
>   reboot, or until the power_dpm_force_performance is set back to ‘auto’ .
>2. Set the specific DPM level values manually:
>   1. First, you’ll need to enable the Power Profile Overdrive
>   functionality. The easiest way is to add 
> “amdgpu.ppfeaturemask=0x”
>   to your linux command line parameters (by editing /boot/grub/grub.cfg
>   manually, editing /etc/default/grub and doing an update-grub, or 
> manually
>   entering the kernel parameter in the GRUB menu before booting).
>   2. Once that’s enabled, you should see the following file:
>   /sys/class/drm/card0/device/pp_od_clk_voltage.
>   3. Check the current DPM level information with “rocm-smi -S” or
>   “cat /sys/class/drm/card0/device/pp_od_clk_voltage file” . Now that we 
> have
>   that, you can see the supported SCLK and voltages for each level.
>   4. You can use the rocm-smi tool to manually change the levels
>   through “rocm-smi --setslevel # MHZ VLT”, where:
>
>i.   #
> is the level (level 7 is probably the one you want, but you can do it for
> all of them)
>
>  ii.  MHZ
> is the speed in MHz
>
>iii.  VLT
> is the voltage in mV.
>
>1. Honestly, you can probably just copy the highest level that you’re
>   comfortable with and set that for all of the levels that exceed the 
> values
>   that you desire. So if you want to keep it to whatever level 6 is, just 
> set
>   level 7 to have the same values as level 6 (that way you don’t have to 
> muck
>   with voltages and such). Or if 5 is the highest that you want, set 
> level 6
>   and level 7 to match level 5
>
>
>
> Hopefully that helps. It also means that you don’t have to constantly try
> to build your own kernel with a change to cap the SCLK cherry-picked on
> tpo. Please let me know if you have any questions at all!
>
>
>
> Kent
>
>
>
> *From:* amd-gfx  *On Behalf Of *Lauri
> Ehrenpreis
> *Sent:* Friday, March 22, 2019 6:18 AM
> *To:* amd-gfx list 
> *Subject:* Limit gpu max clock for ryzen 2400g
>
>
>
> Hi!
>
>
>
> Is there a way how to limit gpu max clock rate? Currently I can either
> leave the clock to automatic mode or force it to specific level
> via /sys/class/drm/card0/device/pp_dpm_sclk. But ideally I would like the
> clock to be automatically regulated but specify a different upper limit for
> power saving reasons.
>
>
>
> --
>
> Lauri
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Limit gpu max clock for ryzen 2400g

2019-03-22 Thread Lauri Ehrenpreis
Found a way how to do it by modifying amdgpu driver. If there's any better
way please let me know..

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
index 5273de3c5b98..70b9fb8d6041 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
@@ -497,6 +497,8 @@ static int smu10_populate_clock_table(struct pp_hwmgr
*hwmgr)
smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetMaxGfxclkFrequency);
result = smum_get_argument(hwmgr);
//smu10_data->gfx_max_freq_limit = result / 10 * 1000;
+smu10_data->gfx_max_freq_limit = 5;

--
Lauri

On Fri, Mar 22, 2019 at 12:18 PM Lauri Ehrenpreis 
wrote:

> Hi!
>
> Is there a way how to limit gpu max clock rate? Currently I can either
> leave the clock to automatic mode or force it to specific level
> via /sys/class/drm/card0/device/pp_dpm_sclk. But ideally I would like the
> clock to be automatically regulated but specify a different upper limit for
> power saving reasons.
>
> --
> Lauri
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Limit gpu max clock for ryzen 2400g

2019-03-22 Thread Lauri Ehrenpreis
Hi!

Is there a way how to limit gpu max clock rate? Currently I can either
leave the clock to automatic mode or force it to specific level
via /sys/class/drm/card0/device/pp_dpm_sclk. But ideally I would like the
clock to be automatically regulated but specify a different upper limit for
power saving reasons.

--
Lauri
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Slow memory access when using OpenCL without X11

2019-03-14 Thread Lauri Ehrenpreis
Yes it affects this a bit but it doesn't get the speed up to "normal"
level. I got best results with "profile_peak" - then the memcpy speed on
CPU is 1/3 of what it is without opencl initialization:

 echo "profile_peak" >
/sys/class/drm/card0/device/power_dpm_force_performance_level
./cl_slow_test 1 5
got 1 platforms 1 devices
speed 3710.360352 avg 3710.360352 mbytes/s
speed 3713.660400 avg 3712.010254 mbytes/s
speed 3797.630859 avg 3740.550537 mbytes/s
speed 3708.004883 avg 3732.414062 mbytes/s
speed 3796.403076 avg 3745.211914 mbytes/s

Without calling clCreateContext:
./cl_slow_test 0 5
speed 7299.201660 avg 7299.201660 mbytes/s
speed 9298.841797 avg 8299.021484 mbytes/s
speed 9360.181641 avg 8652.742188 mbytes/s
speed 9004.759766 avg 8740.746094 mbytes/s
speed 9414.607422 avg 8875.518555 mbytes/s

--
Lauri

On Thu, Mar 14, 2019 at 5:46 PM Ernst Sjöstrand  wrote:

> Does
> echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
> or setting cpu scaling governor to performance affect it at all?
>
> Regards
> //Ernst
>
> Den tors 14 mars 2019 kl 14:31 skrev Lauri Ehrenpreis  >:
> >
> > I tried also with those 2 boards now:
> > https://www.asrock.com/MB/AMD/Fatal1ty%20B450%20Gaming-ITXac/index.asp
> > https://www.msi.com/Motherboard/B450I-GAMING-PLUS-AC
> >
> > Both are using latest BIOS, ubuntu 18.10, kernel
> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.2/
> >
> > There are some differences in dmesg (asrock has some amdgpu assert in
> dmesg) but otherwise results are exactly the same.
> > In desktop env cl_slow_test works fast, over ssh terminal it doesn't. If
> i move mouse then it starts working fast in terminal as well.
> >
> > So one can't use OpenCL without monitor and desktop env running and this
> happens with 2 different chipsets (b350 & b450), latest bios from 3
> different vendors, latest kernel and latest rocm. This doesn't look like
> edge case with unusual setup to me..
> >
> > Attached dmesg, dmidecode, and clinfo from both boards.
> >
> > --
> > Lauri
> >
> > On Wed, Mar 13, 2019 at 10:15 PM Lauri Ehrenpreis 
> wrote:
> >>
> >> For reproduction only the tiny cl_slow_test.cpp is needed which is
> attached to first e-mail.
> >>
> >> System information is following:
> >> CPU: Ryzen5 2400G
> >> Main board: Gigabyte AMD B450 AORUS mini itx:
> https://www.gigabyte.com/Motherboard/B450-I-AORUS-PRO-WIFI-rev-10#kf
> >> BIOS: F5 8.47 MB 2019/01/25 (latest)
> >> Kernel: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/  (amd64)
> >> OS: Ubuntu 18.04 LTS
> >> rocm-opencl-dev installation:
> >> wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo
> apt-key add -
> >> echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial
> main' | sudo tee /etc/apt/sources.list.d/rocm.list
> >> sudo apt install rocm-opencl-dev
> >>
> >> Also exactly the same issue happens with this board:
> https://www.gigabyte.com/Motherboard/GA-AB350-Gaming-3-rev-1x#kf
> >>
> >> I have MSI and Asrock mini itx boards ready as well, So far didn't get
> amdgpu & opencl working there but I'll try again tomorrow..
> >>
> >> --
> >> Lauri
> >>
> >>
> >> On Wed, Mar 13, 2019 at 8:51 PM Kuehling, Felix 
> wrote:
> >>>
> >>> Hi Lauri,
> >>>
> >>> I still think the SMU is doing something funny, but rocm-smi isn't
> >>> showing enough information to really see what's going on.
> >>>
> >>> On APUs the SMU firmware is embedded in the system BIOS. Unlike
> discrete
> >>> GPUs, the SMU firmware is not loaded by the driver. You could try
> >>> updating your system BIOS to the latest version available from your
> main
> >>> board vendor and see if that makes a difference. It may include a newer
> >>> version of the SMU firmware, potentially with a fix.
> >>>
> >>> If that doesn't help, we'd have to reproduce the problem in house to
> see
> >>> what's happening, which may require the same main board and BIOS
> version
> >>> you're using. We can ask our SMU firmware team if they've ever
> >>> encountered your type of problem. But I don't want to give you too much
> >>> hope. It's a tricky problem involving HW, firmware and multiple driver
> >>> components in a fairly unusual configuration.
> >>>
> >>> Regards,
> >>>Felix
> >>>
> >>> On 2019-03-13 7:28 a.m., Lauri Ehrenpreis wrot

Re: Slow memory access when using OpenCL without X11

2019-03-13 Thread Lauri Ehrenpreis
For reproduction only the tiny cl_slow_test.cpp is needed which is attached
to first e-mail.

System information is following:
CPU: Ryzen5 2400G
Main board: Gigabyte AMD B450 AORUS mini itx:
https://www.gigabyte.com/Motherboard/B450-I-AORUS-PRO-WIFI-rev-10#kf
BIOS: F5 8.47 MB 2019/01/25 (latest)
Kernel: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/  (amd64)
OS: Ubuntu 18.04 LTS
rocm-opencl-dev installation:
wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo
apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main'
| sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt install rocm-opencl-dev

Also exactly the same issue happens with this board:
https://www.gigabyte.com/Motherboard/GA-AB350-Gaming-3-rev-1x#kf

I have MSI and Asrock mini itx boards ready as well, So far didn't get
amdgpu & opencl working there but I'll try again tomorrow..

--
Lauri


On Wed, Mar 13, 2019 at 8:51 PM Kuehling, Felix 
wrote:

> Hi Lauri,
>
> I still think the SMU is doing something funny, but rocm-smi isn't
> showing enough information to really see what's going on.
>
> On APUs the SMU firmware is embedded in the system BIOS. Unlike discrete
> GPUs, the SMU firmware is not loaded by the driver. You could try
> updating your system BIOS to the latest version available from your main
> board vendor and see if that makes a difference. It may include a newer
> version of the SMU firmware, potentially with a fix.
>
> If that doesn't help, we'd have to reproduce the problem in house to see
> what's happening, which may require the same main board and BIOS version
> you're using. We can ask our SMU firmware team if they've ever
> encountered your type of problem. But I don't want to give you too much
> hope. It's a tricky problem involving HW, firmware and multiple driver
> components in a fairly unusual configuration.
>
> Regards,
>Felix
>
> On 2019-03-13 7:28 a.m., Lauri Ehrenpreis wrote:
> > What I observe is that moving the mouse made the memory speed go up
> > and also it made mclk=1200Mhz in rocm-smi output.
> > However if I force mclk to 1200Mhz myself then memory speed is still
> > slow.
> >
> > So rocm-smi output when memory speed went fast due to mouse movement:
> > rocm-smi
> > ROCm System Management Interface
> > 
> >
> 
> > GPU   Temp   AvgPwr   SCLKMCLKPCLK  Fan Perf
> > PwrCap   SCLK OD   MCLK OD GPU%
> > GPU[0] : WARNING: Empty SysFS value: pclk
> > GPU[0] : WARNING: Unable to read
> > /sys/class/drm/card0/device/gpu_busy_percent
> > 0 44.0c  N/A  400Mhz  1200Mhz N/A   0%  manual  N/A
> >   0%0%  N/A
> >
> 
> >    End of ROCm SMI Log
> >   
> >
> > And rocm-smi output when I forced memclk=1200MHz myself:
> > rocm-smi --setmclk 2
> > rocm-smi
> > ROCm System Management Interface
> > 
> >
> 
> > GPU   Temp   AvgPwr   SCLKMCLKPCLK  Fan Perf
> > PwrCap   SCLK OD   MCLK OD GPU%
> > GPU[0] : WARNING: Empty SysFS value: pclk
> > GPU[0] : WARNING: Unable to read
> > /sys/class/drm/card0/device/gpu_busy_percent
> > 0 39.0c  N/A  400Mhz  1200Mhz N/A   0%  manual  N/A
> >   0%0%  N/A
> >
> 
> >    End of ROCm SMI Log
> >   
> >
> > So only difference is that temperature shows 44c when memory speed was
> > fast and 39c when it was slow. But mclk was 1200MHz and sclk was
> > 400MHz in both cases.
> > Can it be that rocm-smi just has a bug in reporting and mclk was not
> > actually 1200MHz when I forced it with rocm-smi --setmclk 2 ?
> > That would explain the different behaviour..
> >
> > If so then is there a programmatic way how to really guarantee the
> > high speed mclk? Basically I want do something similar in my program
> > what happens if I move
> > the mouse in desktop env and this way guarantee the normal memory
> > speed each time the program starts.
> >
> > --
> > Lauri
> >
> >
> > On Tue, Mar 12, 2019 at 11:36 PM Deucher, Alexander
> > mailto:al

Re: Slow memory access when using OpenCL without X11

2019-03-13 Thread Lauri Ehrenpreis
What I observe is that moving the mouse made the memory speed go up and
also it made mclk=1200Mhz in rocm-smi output.
However if I force mclk to 1200Mhz myself then memory speed is still slow.

So rocm-smi output when memory speed went fast due to mouse movement:
rocm-smi
ROCm System Management Interface


GPU   Temp   AvgPwr   SCLKMCLKPCLK   Fan Perf
PwrCap   SCLK OD   MCLK OD  GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0 44.0c  N/A  400Mhz  1200Mhz N/A0%  manual  N/A
  0%0%   N/A

   End of ROCm SMI Log


And rocm-smi output when I forced memclk=1200MHz myself:
rocm-smi --setmclk 2
rocm-smi
ROCm System Management Interface


GPU   Temp   AvgPwr   SCLKMCLKPCLK   Fan Perf
PwrCap   SCLK OD   MCLK OD  GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0 39.0c  N/A  400Mhz  1200Mhz N/A0%  manual  N/A
  0%0%   N/A

   End of ROCm SMI Log


So only difference is that temperature shows 44c when memory speed was fast
and 39c when it was slow. But mclk was 1200MHz and sclk was 400MHz in both
cases.
Can it be that rocm-smi just has a bug in reporting and mclk was not
actually 1200MHz when I forced it with rocm-smi --setmclk 2 ?
That would explain the different behaviour..

If so then is there a programmatic way how to really guarantee the high
speed mclk? Basically I want do something similar in my program what
happens if I move
the mouse in desktop env and this way guarantee the normal memory speed
each time the program starts.

--
Lauri


On Tue, Mar 12, 2019 at 11:36 PM Deucher, Alexander <
alexander.deuc...@amd.com> wrote:

> Forcing the sclk and mclk high may impact the CPU frequency since they
> share TDP.
>
> Alex
> --
> *From:* amd-gfx  on behalf of
> Lauri Ehrenpreis 
> *Sent:* Tuesday, March 12, 2019 5:31 PM
> *To:* Kuehling, Felix
> *Cc:* Tom St Denis; amd-gfx@lists.freedesktop.org
> *Subject:* Re: Slow memory access when using OpenCL without X11
>
> However it's not only related to mclk and sclk. I tried this:
> rocm-smi  --setsclk 2
> rocm-smi  --setmclk 3
> rocm-smi
> ROCm System Management Interface
> 
>
> 
> GPU   Temp   AvgPwr   SCLKMCLKPCLK   Fan Perf
> PwrCap   SCLK OD   MCLK OD  GPU%
> GPU[0] : WARNING: Empty SysFS value: pclk
> GPU[0] : WARNING: Unable to read
> /sys/class/drm/card0/device/gpu_busy_percent
> 0 34.0c  N/A  1240Mhz 1333Mhz N/A0%  manual  N/A
> 0%0%   N/A
>
> 
>    End of ROCm SMI Log
> 
>
> ./cl_slow_test 1
> got 1 platforms 1 devices
> speed 3919.777100 avg 3919.777100 mbytes/s
> speed 3809.373291 avg 3864.575195 mbytes/s
> speed 585.796814 avg 2771.649170 mbytes/s
> speed 188.721848 avg 2125.917236 mbytes/s
> speed 188.916367 avg 1738.517090 mbytes/s
>
> So despite forcing max sclk and mclk the memory speed is still slow..
>
> --
> Lauri
>
>
> On Tue, Mar 12, 2019 at 11:21 PM Lauri Ehrenpreis 
> wrote:
>
> IN the case when memory is slow, the rocm-smi outputs this:
> ROCm System Management Interface
> 
>
> 
> GPU   Temp   AvgPwr   SCLKMCLKPCLK   Fan Perf
> PwrCap   SCLK OD   MCLK OD  GPU%
> GPU[0] : WARNING: Empty SysFS value: pclk
> GPU[0] : WARNING: Unable to read
> /sys/class/drm/card0/device/gpu_busy_percent
> 0 30.0c  N/A  400Mhz  933Mhz  N/A0%  autoN/A
> 0%0%   N/A
>
> 
> 

Re: Slow memory access when using OpenCL without X11

2019-03-12 Thread Lauri Ehrenpreis
However it's not only related to mclk and sclk. I tried this:
rocm-smi  --setsclk 2
rocm-smi  --setmclk 3
rocm-smi
ROCm System Management Interface


GPU   Temp   AvgPwr   SCLKMCLKPCLK   Fan Perf
PwrCap   SCLK OD   MCLK OD  GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0 34.0c  N/A  1240Mhz 1333Mhz N/A0%  manual  N/A
  0%0%   N/A

   End of ROCm SMI Log


./cl_slow_test 1
got 1 platforms 1 devices
speed 3919.777100 avg 3919.777100 mbytes/s
speed 3809.373291 avg 3864.575195 mbytes/s
speed 585.796814 avg 2771.649170 mbytes/s
speed 188.721848 avg 2125.917236 mbytes/s
speed 188.916367 avg 1738.517090 mbytes/s

So despite forcing max sclk and mclk the memory speed is still slow..

--
Lauri


On Tue, Mar 12, 2019 at 11:21 PM Lauri Ehrenpreis 
wrote:

> IN the case when memory is slow, the rocm-smi outputs this:
> ROCm System Management Interface
> 
>
> 
> GPU   Temp   AvgPwr   SCLKMCLKPCLK   Fan Perf
> PwrCap   SCLK OD   MCLK OD  GPU%
> GPU[0] : WARNING: Empty SysFS value: pclk
> GPU[0] : WARNING: Unable to read
> /sys/class/drm/card0/device/gpu_busy_percent
> 0 30.0c  N/A  400Mhz  933Mhz  N/A0%  autoN/A
> 0%0%   N/A
>
> 
>    End of ROCm SMI Log
> 
>
> normal memory speed case gives following:
> ROCm System Management Interface
> 
>
> 
> GPU   Temp   AvgPwr   SCLKMCLKPCLK   Fan Perf
> PwrCap   SCLK OD   MCLK OD  GPU%
> GPU[0] : WARNING: Empty SysFS value: pclk
> GPU[0] : WARNING: Unable to read
> /sys/class/drm/card0/device/gpu_busy_percent
> 0 35.0c  N/A  400Mhz  1200Mhz N/A0%  autoN/A
> 0%0%   N/A
>
> 
>    End of ROCm SMI Log
> 
>
> So there is a difference in MCLK - can this cause such a huge slowdown?
>
> --
> Lauri
>
> On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix 
> wrote:
>
>> [adding the list back]
>>
>> I'd suspect a problem related to memory clock. This is an APU where
>> system memory is shared with the CPU, so if the SMU changes memory
>> clocks that would affect CPU memory access performance. If the problem
>> only occurs when OpenCL is running, then the compute power profile could
>> have an effect here.
>>
>> Laurie, can you monitor the clocks during your tests using rocm-smi?
>>
>> Regards,
>>Felix
>>
>> On 2019-03-11 1:15 p.m., Tom St Denis wrote:
>> > Hi Lauri,
>> >
>> > I don't have ROCm installed locally (not on that team at AMD) but I
>> > can rope in some of the KFD folk and see what they say :-).
>> >
>> > (in the mean time I should look into installing the ROCm stack on my
>> > Ubuntu disk for experimentation...).
>> >
>> > Only other thing that comes to mind is some sort of stutter due to
>> > power/clock gating (or gfx off/etc).  But that typically affects the
>> > display/gpu side not the CPU side.
>> >
>> > Felix:  Any known issues with Raven and ROCm interacting over memory
>> > bus performance?
>> >
>> > Tom
>> >
>> > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis > > <mailto:lauri...@gmail.com>> wrote:
>> >
>> > Hi!
>> >
>> > The 100x memory slowdown is hard to belive indeed. I attached the
>> > test program with my first e-mail which depends only on
>> > rocm-opencl-dev package. Would you mind compiling it and checking
>> > if it slows down memory for you as well?
>> >
>> > steps:
>> > 1) g++ cl_slow_test.cpp -o cl_slow_test -I
>> > /opt/rocm/opencl/include/ -L /opt/rocm/opencl/lib/x86_64/  -lOpe

Re: Slow memory access when using OpenCL without X11

2019-03-12 Thread Lauri Ehrenpreis
IN the case when memory is slow, the rocm-smi outputs this:
ROCm System Management Interface


GPU   Temp   AvgPwr   SCLKMCLKPCLK   Fan Perf
PwrCap   SCLK OD   MCLK OD  GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0 30.0c  N/A  400Mhz  933Mhz  N/A0%  autoN/A
  0%0%   N/A

   End of ROCm SMI Log


normal memory speed case gives following:
ROCm System Management Interface


GPU   Temp   AvgPwr   SCLKMCLKPCLK   Fan Perf
PwrCap   SCLK OD   MCLK OD  GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0 35.0c  N/A  400Mhz  1200Mhz N/A0%  autoN/A
  0%0%   N/A

   End of ROCm SMI Log


So there is a difference in MCLK - can this cause such a huge slowdown?

--
Lauri

On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix 
wrote:

> [adding the list back]
>
> I'd suspect a problem related to memory clock. This is an APU where
> system memory is shared with the CPU, so if the SMU changes memory
> clocks that would affect CPU memory access performance. If the problem
> only occurs when OpenCL is running, then the compute power profile could
> have an effect here.
>
> Laurie, can you monitor the clocks during your tests using rocm-smi?
>
> Regards,
>Felix
>
> On 2019-03-11 1:15 p.m., Tom St Denis wrote:
> > Hi Lauri,
> >
> > I don't have ROCm installed locally (not on that team at AMD) but I
> > can rope in some of the KFD folk and see what they say :-).
> >
> > (in the mean time I should look into installing the ROCm stack on my
> > Ubuntu disk for experimentation...).
> >
> > Only other thing that comes to mind is some sort of stutter due to
> > power/clock gating (or gfx off/etc).  But that typically affects the
> > display/gpu side not the CPU side.
> >
> > Felix:  Any known issues with Raven and ROCm interacting over memory
> > bus performance?
> >
> > Tom
> >
> > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis  > <mailto:lauri...@gmail.com>> wrote:
> >
> > Hi!
> >
> > The 100x memory slowdown is hard to belive indeed. I attached the
> > test program with my first e-mail which depends only on
> > rocm-opencl-dev package. Would you mind compiling it and checking
> > if it slows down memory for you as well?
> >
> > steps:
> > 1) g++ cl_slow_test.cpp -o cl_slow_test -I
> > /opt/rocm/opencl/include/ -L /opt/rocm/opencl/lib/x86_64/  -lOpenCL
> > 2) logout from desktop env and disconnect hdmi/diplayport etc
> > 3) log in over ssh
> > 4) run the program ./cl_slow_test 1
> >
> > For me it reproduced even without step 2 as well but less
> > reliably. moving mouse for example could make the memory speed
> > fast again.
> >
> > --
> > Lauri
> >
> >
> >
> > On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis  > <mailto:tstdeni...@gmail.com>> wrote:
> >
> > Hi Lauri,
> >
> > There's really no connection between the two other than they
> > run in the same package.  I too run a 2400G (as my
> > workstation) and I got the same ~6.6GB/sec transfer rate but
> > without a CL app running ...  The only logical reason is your
> > CL app is bottlenecking the APUs memory bus but you claim
> > "simply opening a context is enough" so something else is
> > going on.
> >
> > Your last reply though says "with it running in the
> > background" so it's entirely possible the CPU isn't busy but
> > the package memory controller (shared between both the CPU and
> > GPU) is busy.  For instance running xonotic in a 1080p window
> > on my 4K display reduced the memory test to 5.8GB/sec and
> > that's hardly a heavy memory bound GPU app.
> >
> > The only other possible connection is the GPU is generating so
> > much heat that it's throttling the package which is also
> > unlikely if you have a proper HSF attached (I use the ones
> > that came in the retail boxes).
> >
> > Cheers,
> > Tom
> >
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Slow memory access when using OpenCL without X11

2019-03-10 Thread Lauri Ehrenpreis
Seems sysbench cpu test does not slow down:
1) run sysbench cpu on idle machine:
 sysbench cpu run
...
General statistics:
total time:  10.0033s
total number of events:  19052
2) start ./cl_slow_test 1 1 in background
3) run sysbench again
sysbench cpu run
..
General statistics:
total time:  10.0036s
total number of events:  18979

So if did not slow down considerably.

If I do similar test with sysbench memory test I get following results:
1)  run sysbench memory on idle machine:
sysbench memory --memory-block-size=32M --memory-total-size=100G run
...
66432.00 MiB transferred (6638.95 MiB/sec)
2) start ./cl_slow_test 1 1 in background
3) sysbench memory --memory-block-size=32M --memory-total-size=100G run
...
672.00 MiB transferred (66.40 MiB/sec)

It confirms that memory speed is reduced 100x :(

--
Lauri


On Sat, Mar 9, 2019 at 9:22 PM Jan Vesely  wrote:

> On Sat, Mar 9, 2019 at 1:54 AM Lauri Ehrenpreis 
> wrote:
> >
> > Even if it's using CPU for OCL (I know it's not doing this), why does
> memcpy on CPU slow down permanently, if I'm not doing anything with OpenCL
> after clCreateContext?
> >
> > As you see from test program it just does clCreateContext and then a
> loop of memcpy-s on CPU.
> >
> > Also I found out that writing different values to
> /sys/class/drm/card0/device/power_dpm_force_performance_level changes my
> max memcpy speed on CPU:
> >
> > echo "low" >
> /sys/class/drm/card0/device/power_dpm_force_performance_level
> > ./cl_slow_test 1 5
> > got 1 platforms 1 devices
> > speed 731.810425 avg 731.810425 mbytes/s
> > speed 163.425583 avg 447.618011 mbytes/s
> > speed 123.441612 avg 339.559235 mbytes/s
> > speed 121.655266 avg 285.083252 mbytes/s
> > speed 123.806801 avg 252.827972 mbytes/s
> >
> > echo "high" >
> /sys/class/drm/card0/device/power_dpm_force_performance_level
> > ./cl_slow_test 1 5
> > got 1 platforms 1 devices
> > speed 3742.063721 avg 3742.063721 mbytes/s
> > speed 836.148987 avg 2289.106445 mbytes/s
> > speed 189.379166 avg 1589.197266 mbytes/s
> > speed 189.271393 avg 1239.215820 mbytes/s
> > speed 188.290451 avg 1029.030762 mbytes/s
> >
> > echo "profile_standard" >
> /sys/class/drm/card0/device/power_dpm_force_performance_level
> > ./cl_slow_test 1 5
> > got 1 platforms 1 devices
> > speed 2303.955566 avg 2303.955566 mbytes/s
> > speed 2298.224121 avg 2301.089844 mbytes/s
> > speed 2295.585205 avg 2299.254883 mbytes/s
> > speed 2295.762939 avg 2298.381836 mbytes/s
> > speed 2288.766602 avg 2296.458740 mbytes/s
> >
> >  echo "profile_peak" >
> /sys/class/drm/card0/device/power_dpm_force_performance_level
> > ./cl_slow_test 1 5
> > got 1 platforms 1 devices
> > speed 3710.360352 avg 3710.360352 mbytes/s
> > speed 3713.660400 avg 3712.010254 mbytes/s
> > speed 3797.630859 avg 3740.550537 mbytes/s
> > speed 3708.004883 avg 3732.414062 mbytes/s
> > speed 3796.403076 avg 3745.211914 mbytes/s
> >
> > However none of those is close to the memcpy speed I get when I don't do
> clCreateContext (my test prog has first arg 0):
> > ./cl_slow_test 0 5
> > speed 7299.201660 avg 7299.201660 mbytes/s
> > speed 9298.841797 avg 8299.021484 mbytes/s
> > speed 9360.181641 avg 8652.742188 mbytes/s
> > speed 9004.759766 avg 8740.746094 mbytes/s
> > speed 9414.607422 avg 8875.518555 mbytes/s
> >
> > Also attached clinfo.txt. It shows that opencl is using GPU so device
> node permissions are probably not the issue.
>
> Is it only memory accesses or does overall CPU performance degrade
> (including compute - say sysbench) as well?
>
> Jan
>
> > --
> > Lauri
> >
> > On Fri, Mar 8, 2019 at 10:35 PM Alex Deucher 
> wrote:
> >>
> >> I think you are probably using the CPU for OCL in the remote login
> >> case.  When you log into the desktop, the permissions on the device
> >> nodes get changed dynamically to support accelerated rendering.  You
> >> probably need to change the permissions on the device nodes manually
> >> if you are not logging into the desktop.
> >>
> >> Alex
> >>
> >> On Fri, Mar 8, 2019 at 2:43 PM Lauri Ehrenpreis 
> wrote:
> >> >
> >> > Hi!
> >> >
> >> > I am using Ryzen 2400G with Gigabyte AMD B450 AORUS board. I have
> latest bios, ubuntu 18.04 and latest mainline kernel (5.0.0-05-generic)
> installed. Also I have rocm-dev 2.1.96 but no rock-dkms installed.
> >> >
> >>

Re: Slow memory access when using OpenCL without X11

2019-03-08 Thread Lauri Ehrenpreis
Even if it's using CPU for OCL (I know it's not doing this), why does
memcpy on CPU slow down permanently, if I'm not doing anything with OpenCL
after clCreateContext?

As you see from test program it just does clCreateContext and then a loop
of memcpy-s on CPU.

Also I found out that writing different values to
/sys/class/drm/card0/device/power_dpm_force_performance_level changes my
max memcpy speed on CPU:

echo "low" > /sys/class/drm/card0/device/power_dpm_force_performance_level
./cl_slow_test 1 5
got 1 platforms 1 devices
speed 731.810425 avg 731.810425 mbytes/s
speed 163.425583 avg 447.618011 mbytes/s
speed 123.441612 avg 339.559235 mbytes/s
speed 121.655266 avg 285.083252 mbytes/s
speed 123.806801 avg 252.827972 mbytes/s

echo "high" > /sys/class/drm/card0/device/power_dpm_force_performance_level
./cl_slow_test 1 5
got 1 platforms 1 devices
speed 3742.063721 avg 3742.063721 mbytes/s
speed 836.148987 avg 2289.106445 mbytes/s
speed 189.379166 avg 1589.197266 mbytes/s
speed 189.271393 avg 1239.215820 mbytes/s
speed 188.290451 avg 1029.030762 mbytes/s

echo "profile_standard" >
/sys/class/drm/card0/device/power_dpm_force_performance_level
./cl_slow_test 1 5
got 1 platforms 1 devices
speed 2303.955566 avg 2303.955566 mbytes/s
speed 2298.224121 avg 2301.089844 mbytes/s
speed 2295.585205 avg 2299.254883 mbytes/s
speed 2295.762939 avg 2298.381836 mbytes/s
speed 2288.766602 avg 2296.458740 mbytes/s

 echo "profile_peak" >
/sys/class/drm/card0/device/power_dpm_force_performance_level
./cl_slow_test 1 5
got 1 platforms 1 devices
speed 3710.360352 avg 3710.360352 mbytes/s
speed 3713.660400 avg 3712.010254 mbytes/s
speed 3797.630859 avg 3740.550537 mbytes/s
speed 3708.004883 avg 3732.414062 mbytes/s
speed 3796.403076 avg 3745.211914 mbytes/s

However none of those is close to the memcpy speed I get when I don't do
clCreateContext (my test prog has first arg 0):
./cl_slow_test 0 5
speed 7299.201660 avg 7299.201660 mbytes/s
speed 9298.841797 avg 8299.021484 mbytes/s
speed 9360.181641 avg 8652.742188 mbytes/s
speed 9004.759766 avg 8740.746094 mbytes/s
speed 9414.607422 avg 8875.518555 mbytes/s

Also attached clinfo.txt. It shows that opencl is using GPU so device node
permissions are probably not the issue.

--
Lauri

On Fri, Mar 8, 2019 at 10:35 PM Alex Deucher  wrote:

> I think you are probably using the CPU for OCL in the remote login
> case.  When you log into the desktop, the permissions on the device
> nodes get changed dynamically to support accelerated rendering.  You
> probably need to change the permissions on the device nodes manually
> if you are not logging into the desktop.
>
> Alex
>
> On Fri, Mar 8, 2019 at 2:43 PM Lauri Ehrenpreis 
> wrote:
> >
> > Hi!
> >
> > I am using Ryzen 2400G with Gigabyte AMD B450 AORUS board. I have latest
> bios, ubuntu 18.04 and latest mainline kernel (5.0.0-05-generic)
> installed. Also I have rocm-dev 2.1.96 but no rock-dkms installed.
> >
> > I found that when I log in over ssh and try to use OpenCL (doing
> clCreateContext is enough) then cpu memory accesses after that will slow
> down by 100x.
> > If I connect HDMI cable and log in to desktop mode then this does not
> happen. Also if I don't call clCreateContext then everything works properly.
> >
> > Attached the test program and kernel log also. Test works like that :
> > g++ cl_slow_test.cpp -o cl_slow_test -I /opt/rocm/opencl/include/ -L
> /opt/rocm/opencl/lib/x86_64/  -lOpenCL
> > lauri@rv:~$ ./cl_slow_test 0 5
> > speed 7003.145508 avg 7003.145508 mbytes/s
> > speed 8427.357422 avg 7715.251465 mbytes/s
> > speed 9203.049805 avg 8211.184570 mbytes/s
> > speed 9845.956055 avg 8619.877930 mbytes/s
> > speed 9882.748047 avg 8872.452148 mbytes/s
> > lauri@rv:~$ ./cl_slow_test 1 5
> > got 1 platforms 1 devices
> > speed 1599.803589 avg 1599.803589 mbytes/s
> > speed 1665.426392 avg 1632.614990 mbytes/s
> > speed 146.137253 avg 1137.122437 mbytes/s
> > speed 121.056877 avg 883.106018 mbytes/s
> > speed 122.428970 avg 730.970581 mbytes/s
> >
> > I also tried latest amd-staging kernel
> https://github.com/M-Bab/linux-kernel-amdgpu-binaries and it had the same
> issue.
> >
> > Can anyone point me into right direction?
> >
> > Br,
> > Lauri
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
Number of platforms   1
  Platform Name   AMD Accelerated Parallel 
Processing
  Platform Vendor Advanced Micro Devices, Inc.
  Platform VersionOpenCL 2.1

Slow memory access when using OpenCL without X11

2019-03-08 Thread Lauri Ehrenpreis
Hi!

I am using Ryzen 2400G with Gigabyte AMD B450 AORUS board. I have latest
bios, ubuntu 18.04 and latest mainline kernel (5.0.0-05-generic)
installed. Also I have rocm-dev 2.1.96 but no rock-dkms installed.

I found that when I log in over ssh and try to use OpenCL (doing
clCreateContext is enough) then cpu memory accesses after that will slow
down by 100x.
If I connect HDMI cable and log in to desktop mode then this does not
happen. Also if I don't call clCreateContext then everything works properly.

Attached the test program and kernel log also. Test works like that :
g++ cl_slow_test.cpp -o cl_slow_test -I /opt/rocm/opencl/include/ -L
/opt/rocm/opencl/lib/x86_64/  -lOpenCL
lauri@rv:~$ ./cl_slow_test 0 5
speed 7003.145508 avg 7003.145508 mbytes/s
speed 8427.357422 avg 7715.251465 mbytes/s
speed 9203.049805 avg 8211.184570 mbytes/s
speed 9845.956055 avg 8619.877930 mbytes/s
speed 9882.748047 avg 8872.452148 mbytes/s
lauri@rv:~$ ./cl_slow_test 1 5
got 1 platforms 1 devices
speed 1599.803589 avg 1599.803589 mbytes/s
speed 1665.426392 avg 1632.614990 mbytes/s
speed 146.137253 avg 1137.122437 mbytes/s
speed 121.056877 avg 883.106018 mbytes/s
speed 122.428970 avg 730.970581 mbytes/s

I also tried latest amd-staging kernel
https://github.com/M-Bab/linux-kernel-amdgpu-binaries and it had the same
issue.

Can anyone point me into right direction?

Br,
Lauri
[0.00] Linux version 5.0.0-05-generic (kernel@gloin) (gcc version 
8.2.0 (Ubuntu 8.2.0-21ubuntu1)) #201903032031 SMP Mon Mar 4 01:33:18 UTC 2019
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.0.0-05-generic 
root=UUID=c5450568-4da2-4760-b09a-29eddec1e9a3 ro quiet splash vt.handoff=1
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Hygon HygonGenuine
[0.00]   Centaur CentaurHauls
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009ebff] usable
[0.00] BIOS-e820: [mem 0x0009ec00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x09cf] usable
[0.00] BIOS-e820: [mem 0x09d0-0x09ff] reserved
[0.00] BIOS-e820: [mem 0x0a00-0x0a1f] usable
[0.00] BIOS-e820: [mem 0x0a20-0x0a209fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x0a20a000-0x0aff] usable
[0.00] BIOS-e820: [mem 0x0b00-0x0b01] reserved
[0.00] BIOS-e820: [mem 0x0b02-0x5b862fff] usable
[0.00] BIOS-e820: [mem 0x5b863000-0x5b98afff] reserved
[0.00] BIOS-e820: [mem 0x5b98b000-0x5bb0bfff] usable
[0.00] BIOS-e820: [mem 0x5bb0c000-0x5bf1cfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x5bf1d000-0x5ce60fff] reserved
[0.00] BIOS-e820: [mem 0x5ce61000-0x5eff] usable
[0.00] BIOS-e820: [mem 0x5f00-0xdfff] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfd10-0xfdff] reserved
[0.00] BIOS-e820: [mem 0xfea0-0xfea0] reserved
[0.00] BIOS-e820: [mem 0xfeb8-0xfec01fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfec3-0xfec30fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved
[0.00] BIOS-e820: [mem 0xfed4-0xfed44fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved
[0.00] BIOS-e820: [mem 0xfedc2000-0xfedc] reserved
[0.00] BIOS-e820: [mem 0xfedd4000-0xfedd5fff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfeef] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00021f33] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 3.2.1 present.
[0.00] DMI: Gigabyte Technology Co., Ltd. B450 I AORUS PRO WIFI/B450 I 
AORUS PRO WIFI-CF, BIOS F5 01/25/2019
[0.00] tsc: Fast TSC calibration failed
[0.00] e820: update [mem 0x-0x0fff]