Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2024-01-17 Thread Lipeng Zhu




On 1/3/2024 5:14 PM, Lipeng Zhu wrote:



On 2023/12/21 19:42, Thomas Schwinge wrote:

Hi!

On 2023-12-13T21:52:29+0100, I wrote:

On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:

On 2023/12/12 1:45, H.J. Lu wrote:
On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  
wrote:

On 2023/12/9 23:23, Jakub Jelinek wrote:

On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the
percentage to step into the insert_unit function is around 30%, in
most instances, we can get the unit in the phase of reading the
unit_cache or unit_root tree. So split the read/write phase by
rwlock would be an approach to make it more parallel.

BTW, the IPC metrics can gain around 9x in our test server with
220 cores. The benchmark we used is
https://github.com/rwesson/NEAT



Ok for trunk, thanks.



Thanks! Looking forward to landing to trunk.



Pushed for you.



I've just filed 
"'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' 
execution test timeouts".

Would you be able to look into that?


See my update in there.


Grüße
  Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 
201, 80634 München; Gesellschaft mit beschränkter Haftung; 
Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: 
München; Registergericht München, HRB 106955




Updated in https://gcc.gnu.org/PR113005. Could you help to verify if the 
draft patch would fix the execution test timeout issue on your side?




Hi Thomas,

Any feedback from your side?

Regards,
Lipeng Zhu





Re: RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2024-01-03 Thread Lipeng Zhu




On 2023/12/21 19:42, Thomas Schwinge wrote:

Hi!

On 2023-12-13T21:52:29+0100, I wrote:

On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:

On 2023/12/12 1:45, H.J. Lu wrote:

On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:

On 2023/12/9 23:23, Jakub Jelinek wrote:

On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the
percentage to step into the insert_unit function is around 30%, in
most instances, we can get the unit in the phase of reading the
unit_cache or unit_root tree. So split the read/write phase by
rwlock would be an approach to make it more parallel.

BTW, the IPC metrics can gain around 9x in our test server with
220 cores. The benchmark we used is
https://github.com/rwesson/NEAT



Ok for trunk, thanks.



Thanks! Looking forward to landing to trunk.



Pushed for you.



I've just filed 
"'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test 
timeouts".
Would you be able to look into that?


See my update in there.


Grüße
  Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955



Updated in https://gcc.gnu.org/PR113005. Could you help to verify if the 
draft patch would fix the execution test timeout issue on your side?


diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 
b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90

index f90ecbeb00f..1c271ae031d 100644
--- a/libgomp/testsuite/libgomp.fortran/rwlock_1.f90
+++ b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90
@@ -7,13 +7,12 @@ program main
   use omp_lib
   implicit none
   integer:: unit_number, v1, v2, i
-  character(11) :: file_name
+  character(16) :: file_name
   character(3) :: async = "no"
   !$omp parallel private (unit_number, v1, v2, file_name, async, i)
 do i = 0, 100
   unit_number = 10 + omp_get_thread_num ()
-  write (file_name, "(I3, A)") unit_number, "_tst.dat"
-  file_name = adjustl(file_name)
+  write (file_name, "(I5.5, A)") unit_number, "_tst.dat"
   open (unit_number, file=file_name, asynchronous="yes")
   ! call inquire with file parameter to test find_file in unix.c
   inquire (file=file_name, asynchronous=async)

Lipeng Zhu



Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2024-01-02 Thread Lipeng Zhu




On 2024/1/2 11:57, Vaseeharan Vinayagamoorthy wrote:

Hi Lipeng,

It looks like your draft patch to fix the builds for arm-none-eabi target is 
not merged yet, because our arm-none-eabi builds are still broken. Are you 
waiting for additional information, or would you be able to fix this issue?

Kind regards,
Vasee


Hi Vasee,

Actually I already sent a patch 
https://inbox.sourceware.org/gcc-patches/20231222023605.3894839-1-lipeng@intel.com/ 
to fix the build failure issue, now it is waiting for community to review.


Lipeng Zhu


From: Richard Earnshaw 
Sent: 15 December 2023 19:23
To: Lipeng Zhu ; Richard Earnshaw ; 
ja...@redhat.com 
Cc: fort...@gcc.gnu.org ; gcc-patches@gcc.gnu.org ; 
hongjiu...@intel.com ; pan.d...@intel.com ; rep.dot@gmail.com 
; tianyou...@intel.com ; tkoe...@netcologne.de 
; wangyang@intel.com 
Subject: Re: [PATCH v7] libgfortran: Replace mutex with rwlock



On 15/12/2023 11:31, Lipeng Zhu wrote:



On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:

On 09/12/2023 15:39, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

 * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
 (__gthrw): New function.
 (__gthread_rwlock_rdlock): New function.
 (__gthread_rwlock_tryrdlock): New function.
 (__gthread_rwlock_wrlock): New function.
 (__gthread_rwlock_trywrlock): New function.
 (__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

 * io/async.c (DEBUG_LINE): New macro.
 * io/async.h (RWLOCK_DEBUG_ADD): New macro.
 (CHECK_RDLOCK): New macro.
 (CHECK_WRLOCK): New macro.
 (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
 (IN_RWLOCK_DEBUG_QUEUE): New macro.
 (RDLOCK): New macro.
 (WRLOCK): New macro.
 (RWUNLOCK): New macro.
 (RD_TO_WRLOCK): New macro.
 (INTERN_RDLOCK): New macro.
 (INTERN_WRLOCK): New macro.
 (INTERN_RWUNLOCK): New macro.
 * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
 a comment.
 (unit_lock): Remove including associated internal_proto.
 (unit_rwlock): New declarations including associated internal_proto.
 (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
 instead of __gthread_mutex_lock and __gthread_mutex_unlock on
 unit_lock.
 * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
 unit_rwlock instead of LOCK and UNLOCK on unit_lock.
 (st_write_done_worker): Likewise.
 * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
 comment. Use unit_rwlock variable instead of unit_lock variable.
 (get_gfc_unit_from_unit_root): New function.
 (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
 instead of LOCK and UNLOCK on unit_lock.
 (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
 LOCK and UNLOCK on unit_lock.
 (close_units): Likewise.
 (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
 unit_lock.
 * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
 instead of LOCK and UNLOCK on unit_lock.
 (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
 of LOCK and UNLOCK on unit_lock.



It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In
function
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’
[-Wimplicit-function-declaration]
   1023 |   WRLOCK (_rwlock);
|   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’
[-Wimplicit-function-declaration]
   1025 |   RWUNLOCK (_rwlock);
|   ^~~~


R.


Hi Richard,

The root cause is that the macro WRLOCK and RWUNLOCK are not defined in
io.h. The reason of x86 platform not failed is that
HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never
been used. Code logic show as below:
#ifdef HAVE_ATOMIC_FETCH_ADD
(void) __atomic_fetch_add (>waiting, -1, __ATOMIC_RELAXED);
#else
WRLOCK (_rwlock);
u->waiting--;
RWUNLOCK (_rwlock);
#endif

I just draft a patch try to fix this bug, because I didn't have arm
platform, would you help to va

Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2024-01-02 Thread Vaseeharan Vinayagamoorthy
Hi Lipeng,

It looks like your draft patch to fix the builds for arm-none-eabi target is 
not merged yet, because our arm-none-eabi builds are still broken. Are you 
waiting for additional information, or would you be able to fix this issue?

Kind regards,
Vasee

From: Richard Earnshaw 
Sent: 15 December 2023 19:23
To: Lipeng Zhu ; Richard Earnshaw 
; ja...@redhat.com 
Cc: fort...@gcc.gnu.org ; gcc-patches@gcc.gnu.org 
; hongjiu...@intel.com ; 
pan.d...@intel.com ; rep.dot@gmail.com 
; tianyou...@intel.com ; 
tkoe...@netcologne.de ; wangyang@intel.com 

Subject: Re: [PATCH v7] libgfortran: Replace mutex with rwlock



On 15/12/2023 11:31, Lipeng Zhu wrote:
>
>
> On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:
>> On 09/12/2023 15:39, Lipeng Zhu wrote:
>>> This patch try to introduce the rwlock and split the read/write to
>>> unit_root tree and unit_cache with rwlock instead of the mutex to
>>> increase CPU efficiency. In the get_gfc_unit function, the percentage
>>> to step into the insert_unit function is around 30%, in most instances,
>>> we can get the unit in the phase of reading the unit_cache or unit_root
>>> tree. So split the read/write phase by rwlock would be an approach to
>>> make it more parallel.
>>>
>>> BTW, the IPC metrics can gain around 9x in our test
>>> server with 220 cores. The benchmark we used is
>>> https://github.com/rwesson/NEAT
>>>
>>> libgcc/ChangeLog:
>>>
>>> * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
>>> (__gthrw): New function.
>>> (__gthread_rwlock_rdlock): New function.
>>> (__gthread_rwlock_tryrdlock): New function.
>>> (__gthread_rwlock_wrlock): New function.
>>> (__gthread_rwlock_trywrlock): New function.
>>> (__gthread_rwlock_unlock): New function.
>>>
>>> libgfortran/ChangeLog:
>>>
>>> * io/async.c (DEBUG_LINE): New macro.
>>> * io/async.h (RWLOCK_DEBUG_ADD): New macro.
>>> (CHECK_RDLOCK): New macro.
>>> (CHECK_WRLOCK): New macro.
>>> (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
>>> (IN_RWLOCK_DEBUG_QUEUE): New macro.
>>> (RDLOCK): New macro.
>>> (WRLOCK): New macro.
>>> (RWUNLOCK): New macro.
>>> (RD_TO_WRLOCK): New macro.
>>> (INTERN_RDLOCK): New macro.
>>> (INTERN_WRLOCK): New macro.
>>> (INTERN_RWUNLOCK): New macro.
>>> * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
>>> a comment.
>>> (unit_lock): Remove including associated internal_proto.
>>> (unit_rwlock): New declarations including associated internal_proto.
>>> (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
>>> instead of __gthread_mutex_lock and __gthread_mutex_unlock on
>>> unit_lock.
>>> * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
>>> unit_rwlock instead of LOCK and UNLOCK on unit_lock.
>>> (st_write_done_worker): Likewise.
>>> * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
>>> comment. Use unit_rwlock variable instead of unit_lock variable.
>>> (get_gfc_unit_from_unit_root): New function.
>>> (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
>>> instead of LOCK and UNLOCK on unit_lock.
>>> (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
>>> LOCK and UNLOCK on unit_lock.
>>> (close_units): Likewise.
>>> (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
>>> unit_lock.
>>> * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
>>> instead of LOCK and UNLOCK on unit_lock.
>>> (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
>>> of LOCK and UNLOCK on unit_lock.
>>>
>>
>> It looks like this has broken builds on arm-none-eabi when using newlib:
>>
>> In file included from
>> /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
>> /runtime/error.c:27:
>> /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In
>> function
>> ‘dec_waiting_unlocked’:
>> /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: 
>> error
>> : implicit declaration of function ‘WRLOCK’
>> [-Wimplicit-function-declaration]
>>   1023 |   WRLOCK (_rwlock);
>>|   ^~
>> /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: 
>> error
>> 

Re: RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-21 Thread Lipeng Zhu

Hi Thomas,

On 2023/12/21 19:42, Thomas Schwinge wrote:

Hi!

On 2023-12-13T21:52:29+0100, I wrote:

On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:

On 2023/12/12 1:45, H.J. Lu wrote:

On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:

On 2023/12/9 23:23, Jakub Jelinek wrote:

On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the
percentage to step into the insert_unit function is around 30%, in
most instances, we can get the unit in the phase of reading the
unit_cache or unit_root tree. So split the read/write phase by
rwlock would be an approach to make it more parallel.

BTW, the IPC metrics can gain around 9x in our test server with
220 cores. The benchmark we used is
https://github.com/rwesson/NEAT



Ok for trunk, thanks.



Thanks! Looking forward to landing to trunk.



Pushed for you.



I've just filed 
"'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test 
timeouts".
Would you be able to look into that?


See my update in there.


Grüße
  Thomas
-- > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 
201, 80634 München; Gesellschaft mit beschränkter Haftung; 
Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: 
München; Registergericht München, HRB 106955




Since I don't have gcc bugzilla account. Reply in this thread:
Limit themselves to some lower 'OMP_NUM_THREADS' should be an option or 
increase the execution timeout?


But I can't reproduce the execution timeout failure on both powerpc9 and 
powerpc10 arch machine. And I also tried to decrease the CPU frequency 
from 2.6G to 800M, these test cases still can run successfully.


> so only a little bit of an improvement of the new "rwlock" 
libgfortran vs. old "mutex" GCC 10 one, curiously.  (But supposedly that 
depends on the hardware or other factors?)


The rwlock can increase the IPC a lot, maybe the wall time you listed is 
not obvious.


$ grep ^cpu < /proc/cpuinfo | uniq -c

192 cpu : POWER10 (architected), altivec supported

Native configuration is powerpc64le-unknown-linux-gnu

Schedule of variations:
unix

PASS: libgomp.fortran/rwlock_1.f90   -O0  (test for excess errors)
PASS: libgomp.fortran/rwlock_1.f90   -O0  execution test
PASS: libgomp.fortran/rwlock_1.f90   -O1  (test for excess errors)
PASS: libgomp.fortran/rwlock_1.f90   -O1  execution test
PASS: libgomp.fortran/rwlock_1.f90   -O2  (test for excess errors)
PASS: libgomp.fortran/rwlock_1.f90   -O2  execution test
PASS: libgomp.fortran/rwlock_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors)
PASS: libgomp.fortran/rwlock_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

PASS: libgomp.fortran/rwlock_1.f90   -O3 -g  (test for excess errors)
PASS: libgomp.fortran/rwlock_1.f90   -O3 -g  execution test
PASS: libgomp.fortran/rwlock_1.f90   -Os  (test for excess errors)
PASS: libgomp.fortran/rwlock_1.f90   -Os  execution test
PASS: libgomp.fortran/rwlock_2.f90   -O0  (test for excess errors)
PASS: libgomp.fortran/rwlock_2.f90   -O0  execution test
PASS: libgomp.fortran/rwlock_2.f90   -O1  (test for excess errors)
PASS: libgomp.fortran/rwlock_2.f90   -O1  execution test
PASS: libgomp.fortran/rwlock_2.f90   -O2  (test for excess errors)
PASS: libgomp.fortran/rwlock_2.f90   -O2  execution test
PASS: libgomp.fortran/rwlock_2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors)
PASS: libgomp.fortran/rwlock_2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

PASS: libgomp.fortran/rwlock_2.f90   -O3 -g  (test for excess errors)
PASS: libgomp.fortran/rwlock_2.f90   -O3 -g  execution test
PASS: libgomp.fortran/rwlock_2.f90   -Os  (test for excess errors)
PASS: libgomp.fortran/rwlock_2.f90   -Os  execution test
PASS: libgomp.fortran/rwlock_3.f90   -O0  (test for excess errors)
PASS: libgomp.fortran/rwlock_3.f90   -O0  execution test
PASS: libgomp.fortran/rwlock_3.f90   -O1  (test for excess errors)
PASS: libgomp.fortran/rwlock_3.f90   -O1  execution test
PASS: libgomp.fortran/rwlock_3.f90   -O2  (test for excess errors)
PASS: libgomp.fortran/rwlock_3.f90   -O2  execution test
PASS: libgomp.fortran/rwlock_3.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors)
PASS: libgomp.fortran/rwlock_3.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test


Lipeng Zhu



RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-21 Thread Thomas Schwinge
Hi!

On 2023-12-13T21:52:29+0100, I wrote:
> On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
>> On 2023/12/12 1:45, H.J. Lu wrote:
>>> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
>>> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>>> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>>> > > > This patch try to introduce the rwlock and split the read/write to
>>> > > > unit_root tree and unit_cache with rwlock instead of the mutex to
>>> > > > increase CPU efficiency. In the get_gfc_unit function, the
>>> > > > percentage to step into the insert_unit function is around 30%, in
>>> > > > most instances, we can get the unit in the phase of reading the
>>> > > > unit_cache or unit_root tree. So split the read/write phase by
>>> > > > rwlock would be an approach to make it more parallel.
>>> > > >
>>> > > > BTW, the IPC metrics can gain around 9x in our test server with
>>> > > > 220 cores. The benchmark we used is
>>> > > > https://github.com/rwesson/NEAT
>
>>> > > Ok for trunk, thanks.
>
>>> > Thanks! Looking forward to landing to trunk.
>
>>> Pushed for you.

> I've just filed 
> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution 
> test timeouts".
> Would you be able to look into that?

See my update in there.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-15 Thread Richard Earnshaw




On 15/12/2023 11:31, Lipeng Zhu wrote:



On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:

On 09/12/2023 15:39, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.



It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from 
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran

/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In 
function

‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ 
[-Wimplicit-function-declaration]

  1023 |   WRLOCK (_rwlock);
   |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ 
[-Wimplicit-function-declaration]

  1025 |   RWUNLOCK (_rwlock);
   |   ^~~~


R.


Hi Richard,

The root cause is that the macro WRLOCK and RWUNLOCK are not defined in 
io.h. The reason of x86 platform not failed is that 
HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never 
been used. Code logic show as below:

#ifdef HAVE_ATOMIC_FETCH_ADD
   (void) __atomic_fetch_add (>waiting, -1, __ATOMIC_RELAXED);
#else
   WRLOCK (_rwlock);
   u->waiting--;
   RWUNLOCK (_rwlock);
#endif

I just draft a patch try to fix this bug, because I didn't have arm 
platform, would you help to validate if it was fixed on arm platform?


diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 15daa0995b1..c7f0f7d7d9e 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u)
  #ifdef HAVE_ATOMIC_FETCH_ADD
    (void) __atomic_fetch_add (>waiting, -1, __ATOMIC_RELAXED);
  #else
-  WRLOCK (_rwlock);
+#ifdef __GTHREAD_RWLOCK_INIT
+  __gthread_rwlock_wrlock (_rwlock);
+  u->waiting--;
+  __gthread_rwlock_unlock (_rwlock);
+#else
+  __gthread_mutex_lock (_rwlock);
    u->waiting--;
-  RWUNLOCK (_rwlock);
+  __gthread_mutex_unlock (_rwlock);
+#endif
  #endif
  }


Lipeng Zhu


Hi Lipeng,

Thanks for the quick reply.  I can confirm that with the above change 
the bootstrap failure is fixed.  However, this shouldn't be considered a 
formal review; libgfortran is not really my area.


I'll be away now until January 2nd.

Richard.


Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-15 Thread Lipeng Zhu




On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:

On 09/12/2023 15:39, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.



It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration]
  1023 |   WRLOCK (_rwlock);
   |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration]
  1025 |   RWUNLOCK (_rwlock);
   |   ^~~~


R.


Hi Richard,

The root cause is that the macro WRLOCK and RWUNLOCK are not defined in 
io.h. The reason of x86 platform not failed is that 
HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never 
been used. Code logic show as below:

#ifdef HAVE_ATOMIC_FETCH_ADD
  (void) __atomic_fetch_add (>waiting, -1, __ATOMIC_RELAXED);
#else
  WRLOCK (_rwlock);
  u->waiting--;
  RWUNLOCK (_rwlock);
#endif

I just draft a patch try to fix this bug, because I didn't have arm 
platform, would you help to validate if it was fixed on arm platform?


diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 15daa0995b1..c7f0f7d7d9e 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u)
 #ifdef HAVE_ATOMIC_FETCH_ADD
   (void) __atomic_fetch_add (>waiting, -1, __ATOMIC_RELAXED);
 #else
-  WRLOCK (_rwlock);
+#ifdef __GTHREAD_RWLOCK_INIT
+  __gthread_rwlock_wrlock (_rwlock);
+  u->waiting--;
+  __gthread_rwlock_unlock (_rwlock);
+#else
+  __gthread_mutex_lock (_rwlock);
   u->waiting--;
-  RWUNLOCK (_rwlock);
+  __gthread_mutex_unlock (_rwlock);
+#endif
 #endif
 }


Lipeng Zhu


RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Zhu, Lipeng

On 2023/12/14 20:39, Jakub Jelinek wrote:
> On Thu, Dec 14, 2023 at 01:29:01PM +0100, Thomas Schwinge wrote:
>>> Sure, I will look into that.
>>>
>>> BTW, I didn’t have the PowerPC in hands, do you mind granting the access of 
>>> your
>>> test environment to me to help reproduce the issue?
>>
>> That's unfortunately not possible: it's behind company VPN, restricted
>> access.  :-/ I'll later try to have at least a quick look where it's
>> hanging, or what it's doing.
> 
> There is always https://gcc.gnu.org/wiki/CompileFarm
> There are e.g. 192, 160, 128, 80, 64 thread power machines.

Thanks Jakub,

I submit a request to join the CompileFarm project. Could you help to approve 
it?

Lipeng Zhu

> Whether any of them can actually reproduce it, I haven't tried.
> But shouldn't be that time consuming to reproduce, for this kind of thing
> one can just build
> .../configure --enable-languages=c,c++,fortran --disable-bootstrap 
> --disable-libsanitizer
> make -jN
> compiler and just
> cd power*/libgomp
> make check RUNTESTFLAGS=fortran.exp=rwlock*.f90
> repeatedly to see if it gets stuck.
> 
>   Jakub
> 
>


Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Richard Earnshaw (lists)
On 09/12/2023 15:39, Lipeng Zhu wrote:
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
> 
> libgcc/ChangeLog:
> 
>   * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
>   (__gthrw): New function.
>   (__gthread_rwlock_rdlock): New function.
>   (__gthread_rwlock_tryrdlock): New function.
>   (__gthread_rwlock_wrlock): New function.
>   (__gthread_rwlock_trywrlock): New function.
>   (__gthread_rwlock_unlock): New function.
> 
> libgfortran/ChangeLog:
> 
>   * io/async.c (DEBUG_LINE): New macro.
>   * io/async.h (RWLOCK_DEBUG_ADD): New macro.
>   (CHECK_RDLOCK): New macro.
>   (CHECK_WRLOCK): New macro.
>   (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
>   (IN_RWLOCK_DEBUG_QUEUE): New macro.
>   (RDLOCK): New macro.
>   (WRLOCK): New macro.
>   (RWUNLOCK): New macro.
>   (RD_TO_WRLOCK): New macro.
>   (INTERN_RDLOCK): New macro.
>   (INTERN_WRLOCK): New macro.
>   (INTERN_RWUNLOCK): New macro.
>   * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
>   a comment.
>   (unit_lock): Remove including associated internal_proto.
>   (unit_rwlock): New declarations including associated internal_proto.
>   (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
>   instead of __gthread_mutex_lock and __gthread_mutex_unlock on
>   unit_lock.
>   * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
>   unit_rwlock instead of LOCK and UNLOCK on unit_lock.
>   (st_write_done_worker): Likewise.
>   * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
>   comment. Use unit_rwlock variable instead of unit_lock variable.
>   (get_gfc_unit_from_unit_root): New function.
>   (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
>   instead of LOCK and UNLOCK on unit_lock.
>   (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
>   LOCK and UNLOCK on unit_lock.
>   (close_units): Likewise.
>   (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
>   unit_lock.
>   * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
>   instead of LOCK and UNLOCK on unit_lock.
>   (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
>   of LOCK and UNLOCK on unit_lock.
> 

It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function 
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration]
 1023 |   WRLOCK (_rwlock);
  |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration]
 1025 |   RWUNLOCK (_rwlock);
  |   ^~~~


R.

> ---
> v1 -> v2:
> Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined.
> 
> v2 -> v3:
> Rebase the patch with trunk branch.
> 
> v3 -> v4:
> Update the comments.
> 
> v4 -> v5:
> Fix typos and code formatter.
> 
> v5 -> v6:
> Add unit tests.
> 
> v6 -> v7:
> Update ChangeLog and code formatter.
> 
> Reviewed-by: Hongjiu Lu 
> Reviewed-by: Bernhard Reutner-Fischer 
> Reviewed-by: Thomas Koenig 
> Reviewed-by: Jakub Jelinek 
> Signed-off-by: Lipeng Zhu 
> ---
>  libgcc/gthr-posix.h   |  60 +++
>  libgfortran/io/async.c|   4 +
>  libgfortran/io/async.h| 151 ++
>  libgfortran/io/io.h   |  15 +-
>  libgfortran/io/transfer.c |   8 +-
>  libgfortran/io/unit.c | 117 +-
>  libgfortran/io/unix.c |  16 +-
>  .../testsuite/libgomp.fortran/rwlock_1.f90|  33 
>  .../testsuite/libgomp.fortran/rwlock_2.f90|  22 +++
>  .../testsuite/libgomp.fortran/rwlock_3.f90|  18 +++
>  10 files changed, 386 insertions(+), 58 deletions(-)
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> 
> diff --git 

Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2023 at 01:29:01PM +0100, Thomas Schwinge wrote:
> > Sure, I will look into that.
> >
> > BTW, I didn’t have the PowerPC in hands, do you mind granting the access of 
> > your
> > test environment to me to help reproduce the issue?
> 
> That's unfortunately not possible: it's behind company VPN, restricted
> access.  :-/ I'll later try to have at least a quick look where it's
> hanging, or what it's doing.

There is always https://gcc.gnu.org/wiki/CompileFarm
There are e.g. 192, 160, 128, 80, 64 thread power machines.
Whether any of them can actually reproduce it, I haven't tried.
But shouldn't be that time consuming to reproduce, for this kind of thing
one can just build
.../configure --enable-languages=c,c++,fortran --disable-bootstrap 
--disable-libsanitizer
make -jN
compiler and just
cd power*/libgomp
make check RUNTESTFLAGS=fortran.exp=rwlock*.f90
repeatedly to see if it gets stuck.

Jakub



RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Tobias Burnus


 
 
  
   Hi,
   
   
   
Thomas Schwinge wrote:

   
 

   
On 2023-12-14T02:28:22+, "Zhu, Lipeng"  wrote:



 On 2023/12/14 4:52, Thomas Schwinge wrote:
 >> I've just filed > Would you be able to look into that?

   
 



 Sure, I will look into that.
 



 BTW, I didn’t have the PowerPC in hands, do you mind granting the access of your
 

 test environment to me to help reproduce the issue?
 

   
That's unfortunately not possible:

   
  
   How about https://gcc.gnu.org/wiki/CompileFarm I think there is a PowerPC system available.
   
  
    
   
  
   Tobias
   
  
    
   
  
    
  
 



RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Thomas Schwinge
Hi Lipeng!

On 2023-12-14T02:28:22+, "Zhu, Lipeng"  wrote:
> On 2023/12/14 4:52, Thomas Schwinge wrote:
>> On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
>> > On 2023/12/12 1:45, H.J. Lu wrote:
>> >> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng 
>> wrote:
>> >> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>> >> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>> >> > > > This patch try to introduce the rwlock and split the read/write
>> >> > > > to unit_root tree and unit_cache with rwlock instead of the
>> >> > > > mutex to increase CPU efficiency. In the get_gfc_unit function,
>> >> > > > the percentage to step into the insert_unit function is around
>> >> > > > 30%, in most instances, we can get the unit in the phase of
>> >> > > > reading the unit_cache or unit_root tree. So split the
>> >> > > > read/write phase by rwlock would be an approach to make it more
>> parallel.
>> >> > > >
>> >> > > > BTW, the IPC metrics can gain around 9x in our test server with
>> >> > > > 220 cores. The benchmark we used is
>> >> > > > https://github.com/rwesson/NEAT

>> I've just filed 
>> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution
>> test timeouts".
>> Would you be able to look into that?

> Sure, I will look into that.
>
> BTW, I didn’t have the PowerPC in hands, do you mind granting the access of 
> your
> test environment to me to help reproduce the issue?

That's unfortunately not possible: it's behind company VPN, restricted
access.  :-/ I'll later try to have at least a quick look where it's
hanging, or what it's doing.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-13 Thread Zhu, Lipeng

On 2023/12/14 4:52, Thomas Schwinge wrote:
> Hi Lipeng!
> 
> On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
> > On 2023/12/12 1:45, H.J. Lu wrote:
> >> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng 
> wrote:
> >> > On 2023/12/9 23:23, Jakub Jelinek wrote:
> >> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
> >> > > > This patch try to introduce the rwlock and split the read/write
> >> > > > to unit_root tree and unit_cache with rwlock instead of the
> >> > > > mutex to increase CPU efficiency. In the get_gfc_unit function,
> >> > > > the percentage to step into the insert_unit function is around
> >> > > > 30%, in most instances, we can get the unit in the phase of
> >> > > > reading the unit_cache or unit_root tree. So split the
> >> > > > read/write phase by rwlock would be an approach to make it more
> parallel.
> >> > > >
> >> > > > BTW, the IPC metrics can gain around 9x in our test server with
> >> > > > 220 cores. The benchmark we used is
> >> > > > https://github.com/rwesson/NEAT
> 
> >> > > Ok for trunk, thanks.
> 
> >> > Thanks! Looking forward to landing to trunk.
> 
> >> Pushed for you.
> 
> > Thanks for everyone's patience and help, really appreciate that!
> 
> Congratulations on your first contribution to GCC (as far as I can tell)!
> :-)
> 
> 
> I've just filed 
> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution
> test timeouts".
> Would you be able to look into that?
> 
> 
> Grüße
>  Thomas
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
> Registergericht München, HRB 106955

Hi Thomas,

Sure, I will look into that. 

BTW, I didn’t have the PowerPC in hands, do you mind granting the access of your
test environment to me to help reproduce the issue?

Lipeng Zhu




RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-13 Thread Thomas Schwinge
Hi Lipeng!

On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
> On 2023/12/12 1:45, H.J. Lu wrote:
>> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
>> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>> > > > This patch try to introduce the rwlock and split the read/write to
>> > > > unit_root tree and unit_cache with rwlock instead of the mutex to
>> > > > increase CPU efficiency. In the get_gfc_unit function, the
>> > > > percentage to step into the insert_unit function is around 30%, in
>> > > > most instances, we can get the unit in the phase of reading the
>> > > > unit_cache or unit_root tree. So split the read/write phase by
>> > > > rwlock would be an approach to make it more parallel.
>> > > >
>> > > > BTW, the IPC metrics can gain around 9x in our test server with
>> > > > 220 cores. The benchmark we used is
>> > > > https://github.com/rwesson/NEAT

>> > > Ok for trunk, thanks.

>> > Thanks! Looking forward to landing to trunk.

>> Pushed for you.

> Thanks for everyone's patience and help, really appreciate that!

Congratulations on your first contribution to GCC (as far as I can tell)!
:-)


I've just filed 
"'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test 
timeouts".
Would you be able to look into that?


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-11 Thread Zhu, Lipeng
On 2023/12/12 1:45, H.J. Lu wrote:
> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
> >
> > On 2023/12/9 23:23, Jakub Jelinek wrote:
> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
> > > > This patch try to introduce the rwlock and split the read/write to
> > > > unit_root tree and unit_cache with rwlock instead of the mutex to
> > > > increase CPU efficiency. In the get_gfc_unit function, the
> > > > percentage to step into the insert_unit function is around 30%, in
> > > > most instances, we can get the unit in the phase of reading the
> > > > unit_cache or unit_root tree. So split the read/write phase by
> > > > rwlock would be an approach to make it more parallel.
> > > >
> > > > BTW, the IPC metrics can gain around 9x in our test server with
> > > > 220 cores. The benchmark we used is
> > > > https://github.com/rwesson/NEAT
> > > >
> > > > libgcc/ChangeLog:
> > > >
> > > > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
> > > > (__gthrw): New function.
> > > > (__gthread_rwlock_rdlock): New function.
> > > > (__gthread_rwlock_tryrdlock): New function.
> > > > (__gthread_rwlock_wrlock): New function.
> > > > (__gthread_rwlock_trywrlock): New function.
> > > > (__gthread_rwlock_unlock): New function.
> > > >
> > > > libgfortran/ChangeLog:
> > > >
> > > > * io/async.c (DEBUG_LINE): New macro.
> > > > * io/async.h (RWLOCK_DEBUG_ADD): New macro.
> > > > (CHECK_RDLOCK): New macro.
> > > > (CHECK_WRLOCK): New macro.
> > > > (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
> > > > (IN_RWLOCK_DEBUG_QUEUE): New macro.
> > > > (RDLOCK): New macro.
> > > > (WRLOCK): New macro.
> > > > (RWUNLOCK): New macro.
> > > > (RD_TO_WRLOCK): New macro.
> > > > (INTERN_RDLOCK): New macro.
> > > > (INTERN_WRLOCK): New macro.
> > > > (INTERN_RWUNLOCK): New macro.
> > > > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
> > > > a comment.
> > > > (unit_lock): Remove including associated internal_proto.
> > > > (unit_rwlock): New declarations including associated internal_proto.
> > > > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on
> unit_rwlock
> > > > instead of __gthread_mutex_lock and __gthread_mutex_unlock on
> > > > unit_lock.
> > > > * io/transfer.c (st_read_done_worker): Use WRLOCK and
> RWUNLOCK
> > > on
> > > > unit_rwlock instead of LOCK and UNLOCK on unit_lock.
> > > > (st_write_done_worker): Likewise.
> > > > * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
> > > > comment. Use unit_rwlock variable instead of unit_lock variable.
> > > > (get_gfc_unit_from_unit_root): New function.
> > > > (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on
> unit_rwlock
> > > > instead of LOCK and UNLOCK on unit_lock.
> > > > (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> > > of
> > > > LOCK and UNLOCK on unit_lock.
> > > > (close_units): Likewise.
> > > > (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK
> on
> > > > unit_lock.
> > > > * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
> > > > instead of LOCK and UNLOCK on unit_lock.
> > > > (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock
> instead
> > > > of LOCK and UNLOCK on unit_lock.
> > >
> > > Ok for trunk, thanks.
> > >
> > >   Jakub
> >
> > Thanks! Looking forward to landing to trunk.
> >
> > Lipeng Zhu
> 
> Pushed for you.
> 
> Thanks.
> 
> --
> H.J.

Thanks for everyone's patience and help, really appreciate that!

Lipeng Zhu


Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-11 Thread H.J. Lu
On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
>
> On 2023/12/9 23:23, Jakub Jelinek wrote:
> > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
> > > This patch try to introduce the rwlock and split the read/write to
> > > unit_root tree and unit_cache with rwlock instead of the mutex to
> > > increase CPU efficiency. In the get_gfc_unit function, the percentage
> > > to step into the insert_unit function is around 30%, in most
> > > instances, we can get the unit in the phase of reading the unit_cache
> > > or unit_root tree. So split the read/write phase by rwlock would be an
> > > approach to make it more parallel.
> > >
> > > BTW, the IPC metrics can gain around 9x in our test server with 220
> > > cores. The benchmark we used is https://github.com/rwesson/NEAT
> > >
> > > libgcc/ChangeLog:
> > >
> > > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
> > > (__gthrw): New function.
> > > (__gthread_rwlock_rdlock): New function.
> > > (__gthread_rwlock_tryrdlock): New function.
> > > (__gthread_rwlock_wrlock): New function.
> > > (__gthread_rwlock_trywrlock): New function.
> > > (__gthread_rwlock_unlock): New function.
> > >
> > > libgfortran/ChangeLog:
> > >
> > > * io/async.c (DEBUG_LINE): New macro.
> > > * io/async.h (RWLOCK_DEBUG_ADD): New macro.
> > > (CHECK_RDLOCK): New macro.
> > > (CHECK_WRLOCK): New macro.
> > > (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
> > > (IN_RWLOCK_DEBUG_QUEUE): New macro.
> > > (RDLOCK): New macro.
> > > (WRLOCK): New macro.
> > > (RWUNLOCK): New macro.
> > > (RD_TO_WRLOCK): New macro.
> > > (INTERN_RDLOCK): New macro.
> > > (INTERN_WRLOCK): New macro.
> > > (INTERN_RWUNLOCK): New macro.
> > > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
> > > a comment.
> > > (unit_lock): Remove including associated internal_proto.
> > > (unit_rwlock): New declarations including associated internal_proto.
> > > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
> > > instead of __gthread_mutex_lock and __gthread_mutex_unlock on
> > > unit_lock.
> > > * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK
> > on
> > > unit_rwlock instead of LOCK and UNLOCK on unit_lock.
> > > (st_write_done_worker): Likewise.
> > > * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
> > > comment. Use unit_rwlock variable instead of unit_lock variable.
> > > (get_gfc_unit_from_unit_root): New function.
> > > (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
> > > instead of LOCK and UNLOCK on unit_lock.
> > > (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> > of
> > > LOCK and UNLOCK on unit_lock.
> > > (close_units): Likewise.
> > > (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
> > > unit_lock.
> > > * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
> > > instead of LOCK and UNLOCK on unit_lock.
> > > (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> > > of LOCK and UNLOCK on unit_lock.
> >
> > Ok for trunk, thanks.
> >
> >   Jakub
>
> Thanks! Looking forward to landing to trunk.
>
> Lipeng Zhu

Pushed for you.

Thanks.

-- 
H.J.


RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-09 Thread Zhu, Lipeng
On 2023/12/9 23:23, Jakub Jelinek wrote:
> On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
> > This patch try to introduce the rwlock and split the read/write to
> > unit_root tree and unit_cache with rwlock instead of the mutex to
> > increase CPU efficiency. In the get_gfc_unit function, the percentage
> > to step into the insert_unit function is around 30%, in most
> > instances, we can get the unit in the phase of reading the unit_cache
> > or unit_root tree. So split the read/write phase by rwlock would be an
> > approach to make it more parallel.
> >
> > BTW, the IPC metrics can gain around 9x in our test server with 220
> > cores. The benchmark we used is https://github.com/rwesson/NEAT
> >
> > libgcc/ChangeLog:
> >
> > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
> > (__gthrw): New function.
> > (__gthread_rwlock_rdlock): New function.
> > (__gthread_rwlock_tryrdlock): New function.
> > (__gthread_rwlock_wrlock): New function.
> > (__gthread_rwlock_trywrlock): New function.
> > (__gthread_rwlock_unlock): New function.
> >
> > libgfortran/ChangeLog:
> >
> > * io/async.c (DEBUG_LINE): New macro.
> > * io/async.h (RWLOCK_DEBUG_ADD): New macro.
> > (CHECK_RDLOCK): New macro.
> > (CHECK_WRLOCK): New macro.
> > (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
> > (IN_RWLOCK_DEBUG_QUEUE): New macro.
> > (RDLOCK): New macro.
> > (WRLOCK): New macro.
> > (RWUNLOCK): New macro.
> > (RD_TO_WRLOCK): New macro.
> > (INTERN_RDLOCK): New macro.
> > (INTERN_WRLOCK): New macro.
> > (INTERN_RWUNLOCK): New macro.
> > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
> > a comment.
> > (unit_lock): Remove including associated internal_proto.
> > (unit_rwlock): New declarations including associated internal_proto.
> > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
> > instead of __gthread_mutex_lock and __gthread_mutex_unlock on
> > unit_lock.
> > * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK
> on
> > unit_rwlock instead of LOCK and UNLOCK on unit_lock.
> > (st_write_done_worker): Likewise.
> > * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
> > comment. Use unit_rwlock variable instead of unit_lock variable.
> > (get_gfc_unit_from_unit_root): New function.
> > (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
> > instead of LOCK and UNLOCK on unit_lock.
> > (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> of
> > LOCK and UNLOCK on unit_lock.
> > (close_units): Likewise.
> > (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
> > unit_lock.
> > * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
> > instead of LOCK and UNLOCK on unit_lock.
> > (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> > of LOCK and UNLOCK on unit_lock.
> 
> Ok for trunk, thanks.
> 
>   Jakub

Thanks! Looking forward to landing to trunk.

Lipeng Zhu


Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-09 Thread Jakub Jelinek
On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
> 
> libgcc/ChangeLog:
> 
>   * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
>   (__gthrw): New function.
>   (__gthread_rwlock_rdlock): New function.
>   (__gthread_rwlock_tryrdlock): New function.
>   (__gthread_rwlock_wrlock): New function.
>   (__gthread_rwlock_trywrlock): New function.
>   (__gthread_rwlock_unlock): New function.
> 
> libgfortran/ChangeLog:
> 
>   * io/async.c (DEBUG_LINE): New macro.
>   * io/async.h (RWLOCK_DEBUG_ADD): New macro.
>   (CHECK_RDLOCK): New macro.
>   (CHECK_WRLOCK): New macro.
>   (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
>   (IN_RWLOCK_DEBUG_QUEUE): New macro.
>   (RDLOCK): New macro.
>   (WRLOCK): New macro.
>   (RWUNLOCK): New macro.
>   (RD_TO_WRLOCK): New macro.
>   (INTERN_RDLOCK): New macro.
>   (INTERN_WRLOCK): New macro.
>   (INTERN_RWUNLOCK): New macro.
>   * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
>   a comment.
>   (unit_lock): Remove including associated internal_proto.
>   (unit_rwlock): New declarations including associated internal_proto.
>   (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
>   instead of __gthread_mutex_lock and __gthread_mutex_unlock on
>   unit_lock.
>   * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
>   unit_rwlock instead of LOCK and UNLOCK on unit_lock.
>   (st_write_done_worker): Likewise.
>   * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
>   comment. Use unit_rwlock variable instead of unit_lock variable.
>   (get_gfc_unit_from_unit_root): New function.
>   (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
>   instead of LOCK and UNLOCK on unit_lock.
>   (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
>   LOCK and UNLOCK on unit_lock.
>   (close_units): Likewise.
>   (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
>   unit_lock.
>   * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
>   instead of LOCK and UNLOCK on unit_lock.
>   (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
>   of LOCK and UNLOCK on unit_lock.

Ok for trunk, thanks.

Jakub



[PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-09 Thread Lipeng Zhu
This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.

---
v1 -> v2:
Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined.

v2 -> v3:
Rebase the patch with trunk branch.

v3 -> v4:
Update the comments.

v4 -> v5:
Fix typos and code formatter.

v5 -> v6:
Add unit tests.

v6 -> v7:
Update ChangeLog and code formatter.

Reviewed-by: Hongjiu Lu 
Reviewed-by: Bernhard Reutner-Fischer 
Reviewed-by: Thomas Koenig 
Reviewed-by: Jakub Jelinek 
Signed-off-by: Lipeng Zhu 
---
 libgcc/gthr-posix.h   |  60 +++
 libgfortran/io/async.c|   4 +
 libgfortran/io/async.h| 151 ++
 libgfortran/io/io.h   |  15 +-
 libgfortran/io/transfer.c |   8 +-
 libgfortran/io/unit.c | 117 +-
 libgfortran/io/unix.c |  16 +-
 .../testsuite/libgomp.fortran/rwlock_1.f90|  33 
 .../testsuite/libgomp.fortran/rwlock_2.f90|  22 +++
 .../testsuite/libgomp.fortran/rwlock_3.f90|  18 +++
 10 files changed, 386 insertions(+), 58 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90

diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h
index aebcfdd9f4c..73283082997 100644
--- a/libgcc/gthr-posix.h
+++ b/libgcc/gthr-posix.h
@@ -48,6 +48,9 @@ typedef pthread_t __gthread_t;
 typedef pthread_key_t __gthread_key_t;
 typedef pthread_once_t __gthread_once_t;
 typedef pthread_mutex_t __gthread_mutex_t;
+#ifndef __cplusplus
+typedef pthread_rwlock_t __gthread_rwlock_t;
+#endif
 typedef pthread_mutex_t __gthread_recursive_mutex_t;
 typedef pthread_cond_t __gthread_cond_t;
 typedef struct timespec __gthread_time_t;
@@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t;
 
 #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER
 #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function
+#ifndef __cplusplus
+#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER
+#endif
 #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT
 #if