Problem resolved.
Among other errors, there was a missing OMP_BARRIER statement missing from
my code, which caused it to occasionally end prematurely. Thanks to Dmitri
Chubarov, Siberian Branch of the Russian Academy of Sciences, for pointing
this error out!
Specifically, the inner loop in the example seems to require:
max_dv = 0.d0
!$OMP BARRIER
!$OMP DO
!$OMP& REDUCTION(MAX:max_dv)
do j=2,(Ny-1)
do i=2,(Nx-1)
v(i,j) = v(i,j) + dv(i,j)
if(dabs(dv(i,j)) .gt. max_dv) then
max_dv = dv(i,j)
endif
end do
end do
!$OMP END DO
Nathan Moore
On Wed, Nov 26, 2008 at 12:27 PM, Nathan Moore <[EMAIL PROTECTED]> wrote:
> I've been working on a new machine in our cluster over the past few days
> and found something really weird in an OpnMP example program I use in one of
> the classes I teach.
>
> The machine is a dual-proc AMD Opteron 2350, Tyan n3600T (S2937) mainboard,
> w/ 8GB ram. Initially, I installed the i386 version of SL 5.2, but then
> realized that only half of the RAM was usable, and re-installed SL5.2 x86_64
> this morning.
>
> The example program is appended to the end of this email. Its a 2-d
> finite-difference solution to the laplace equation (the context being to
> "predict" lightning srtikes based on the potential between the ground and
> some clouds overhead).
>
> The program scales beautifully up to OMP_NUM_THREADS=7, but when I set the
> number of threads to 8, something wierd happens, and instead of taking the
> normal 241 iterations to converge, the program converges after 1 step. This
> reeks of numerical instability to me, but my programming expeirence in
> x86_64 is very limited.
>
> I'm using gfortran, with the simple compile string,
> gfortran clouds_example_OpenMP.f90 -m64 -fopenmp
>
> Any insight into what obvious mistake I'm making would be wonderful!
>
> The stability of the output seems erratic to me. Sometimes when
> OMP_NUM_THREADS=7 the result converges normally after 241 iterations and at
> other times, the result converges after 1 iteration (indicating some sort of
> problem with hardware???)
>
> This didn't occur yesterday when the machine was running SL5.2, i386.
>
>
>
>
> Simulation Output:
>
> [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=1
> [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS
> [EMAIL PROTECTED] clouds]$ ./a.out
> Hello World from thread 0
> There are 1 threads
> ...
> convergence criteria is \Delta V < 0.250000003725290
> iterations necessary, 241
> initialization time, 0
> simulation time, 57
>
>
>
>
> [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=2
> [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS
> [EMAIL PROTECTED] clouds]$ ./a.out
> Hello World from thread 0
> Hello World from thread 1
> There are 2 threads
> ...
> convergence criteria is \Delta V < 0.250000003725290
> iterations necessary, 241
> initialization time, 0
> simulation time, 28
>
>
>
>
> [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=4
> [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS
> [EMAIL PROTECTED] clouds]$ ./a.out
> Hello World from thread 3
> Hello World from thread 1
> Hello World from thread 0
> Hello World from thread 2
> There are 4 threads
> ...
> convergence criteria is \Delta V < 0.250000003725290
> iterations necessary, 241
> initialization time, 0
> simulation time, 14
>
>
>
>
> [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=8
> [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS
> [EMAIL PROTECTED] clouds]$ ./a.out
> Hello World from thread 2
> ...
> convergence criteria is \Delta V < 0.250000003725290
> iterations necessary, 1
> initialization time, 0
> simulation time, 0
>
> Code listing:
>
> [EMAIL PROTECTED] clouds]$ cat clouds_example_OpenMP.f90
> !
> !
> use omp_lib
> !
> IMPLICIT NONE
> integer,parameter::Nx=2000
> integer,parameter::Ny=2000
> real*8 v(Nx,Ny), dv(Nx,Ny)
> integer boundary(Nx,Ny)
> integer i,j,converged,i1,i2
> integer t0,t1,t2
> real*8 convergence_v, v_cloud, v_ground, max_dv
> real*8 bump_P,old_v
> real*8 Lx,Ly,dx,dy,v_y
> !
> real*8 lightning_rod_center, lightning_rod_height
> !
> real*8 house_center, house_height, house_width
> integer num_iterations
> !
> integer:: id, nthreads
> !$omp parallel private(id)
> id = omp_get_thread_num()
> write (*,*) 'Hello World from thread', id
> !$omp barrier
> if ( id == 0 ) then
> nthreads = omp_get_num_threads()
> write (*,*) 'There are', nthreads, 'threads'
> end if
> !$omp end parallel
>
> ! initial time
> t0 = secnds(0.0)
>
> dx =0.1d0 ! differential lengths, m
> dy =0.1d0
> Lx = Nx*dx ! system sizes, m
> Ly = Ny*dy
>
> print *,"\nSimulation has bounds:\n\tX: 0,",Lx,"\n\tY: 0,",Ly
> print *,"\tNx = ",Nx,"\n\tNy = ",Ny
> print *,"\tdx = ",dx,"\n\tdy = ",dy
>
> v_cloud = -10000.d0 ! volts
> v_ground = 0.d0
>
> ! initialize the the boundary conditions
> !
> ! first, set the solution function (v), to look like a
> ! parallel-plate capacitor
> !
> ! note that there is one large parallel section and several
> ! parallel do's inside that region
> !$OMP PARALLEL
> !
> !$OMP DO
> !$OMP& SHARED(Nx,Ny,boundary,v_cloud,v_ground,Ly,dy,v)
> !$OMP& PRIVATE(i,j)
> do j=1,Ny
> do i=1,Nx
> boundary(i,j)=0
> v(i,j) = v_ground + (v_cloud-v_ground)*(j*dy/Ly)
> end do
> end do
> !$OMP END DO
> !
> !$OMP DO
> !$OMP& SHARED(Nx,Ny,boundary)
> !$OMP& PRIVATE(i)
> do i=1,Nx
> boundary(i,1)=1 ! we need to ensure that the edges of
> boundary(i,Ny)=1 ! the domain are held as boundary
> end do
> !$OMP END DO
> !
> !$OMP DO
> !$OMP& SHARED(boundary,Nx)
> !$OMP& PRIVATE(j)
> do j=1,Ny
> boundary(1,j)=1
> boundary(Nx,j)=1
> end do
> !$OMP END DO
> !$OMP END PARALLEL
>
>
> ! set up an interesting feature on the lower boundary
> ! do this in parallel with SECTIONS directive
> !
> !$OMP PARALLEL
> !$OMP& SHARED(v,boundary,Nx,Ny,dx,dy,Lx,Ly,lightning_rod_height)
> !$OMP& PRIVATE(lightning_rod_center,house_center,house_height,house_width))
> !$OMP SECTIONS
>
> !$OMP SECTION
> ! Lightning_rod
> lightning_rod_center = Lx*0.6d0
> lightning_rod_height = 5.0d0
> call
> create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
>
> !$OMP SECTION
> lightning_rod_center = Lx*0.5d0
> call
> create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
>
> !$OMP SECTION
> lightning_rod_center = Lx*0.7d0
> call
> create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
>
> !$OMP SECTION
> ! house
> house_center = 0.4d0*Lx
> house_height = 5.0d0
> house_width = 5.0d0
> call
> create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)
>
> !$OMP END SECTIONS
> !$OMP END PARALLEL
>
> ! initialization done
> t1 = secnds(0.0)
>
>
>
>
>
> ! main solution iteration
> !
> ! repeat the recursion relation until the maximum change
> ! from update to update is less than a convergence epsilon,
> convergence_v = (0.05)*dabs(v_ground-v_cloud)/(1.d0*Ny)
> print *,"\nconvergence criteria is \Delta V < ",convergence_v
> num_iterations = 0
>
> !
> ! iteration implemented with a goto or a do-while
> converged=0
> do while( converged .eq. 0)
>
> converged = 1
> num_iterations = num_iterations + 1
> !$OMP PARALLEL
> !$OMP DO
> !$OMP& PRIVATE(i,j)
> !$OMP& SHARED(Ny,Nx,dv,v,boundary))
> do j=2,(Ny-1)
> do i=2,(Nx-1)
> dv(i,j) =
> 0.25d0*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1)) - v(i,j)
> dv(i,j) = dv(i,j)*(1.d0-DFLOAT(boundary(i,j)))
> end do
> end do
> !$OMP END DO
>
> max_dv = 0.d0
> !$OMP DO
> !$OMP& PRIVATE(i,j)
> !$OMP& SHARED(NX,NY,dv,v))
> !$OMP& REDUCTION(MAX:max_dv)
> do j=2,(Ny-1)
> do i=2,(Nx-1)
> v(i,j) = v(i,j) + dv(i,j)
> if(dv(i,j) .gt. max_dv) then
> max_dv = dv(i,j)
> endif
> end do
> end do
> !$OMP END DO
> !$OMP END PARALLEL
>
> if(max_dv .ge. convergence_v) then
> converged = 0
> endif
>
> end do
>
>
>
>
>
>
>
> ! simulation finished
> t2 = secnds(0.0)
>
> print *," iterations necessary, ",num_iterations
> print *," initialization time, ",t1-t0
> print *," simulation time, ",t2-t1
>
>
> open(unit=10,file="v_output.dat")
> write(10,*) "# x\ty\tv(x,y)"
> do j=1,Ny
> !do i=1,Nx
> ! skipping the full array print to save execution time
> ! the printed data file is normally sent to gnuplot w/ splot
> i=10
> write (10,*) i*dx,j*dy,v(i,j)
> !enddo
> write (10,*)" "
> end do
> close(10)
>
>
> stop
> end
>
>
>
>
> subroutine
> create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
> IMPLICIT NONE
> integer Nx, Ny,j,boundary(Nx,Ny)
> integer j_limit
> integer index_lightning_rod_center
> real*8 v(Nx,Ny)
> real*8 lightning_rod_center,lightning_rod_height
> real*8 dx, dy, v_ground
>
> index_lightning_rod_center = lightning_rod_center/dx
> j_limit = lightning_rod_height/dy
> do j=1,j_limit
> v(index_lightning_rod_center,j) = v_ground
> boundary(index_lightning_rod_center,j) = 1
> end do
>
> print *,"Created a lightning rod of height ",lightning_rod_height
> print *,"\ty_index ",j_limit
> print *,"\tx-position ",lightning_rod_center
> print *,"\tx_index ",index_lightning_rod_center
>
>
> end subroutine
>
>
>
>
>
>
>
> subroutine
> create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)
> IMPLICIT NONE
> integer Nx, Ny, boundary(Nx,Ny)
> real*8 v(Nx,Ny)
> real*8 v_ground, dx, dy
> integer i,j,i_limit,j_limit, index_house_center
> real*8 house_center,house_height,house_width
>
> index_house_center = house_center/dx
> i_limit = 0.5d0*house_width/dx
> j_limit = house_height/dy
> do j=1,j_limit
> do i=(index_house_center-i_limit),(index_house_center+i_limit)
> v(i,j) = v_ground
> boundary(i,j) = 1
> end do
> end do
>
> print *,"Created a house of height ",house_height
> print *,"\ty_index ",j_limit
> print *,"\twidth ",house_width
> print *,"\thouse bounds:
> ",index_house_center-i_limit,index_house_center+i_limit
>
> end subroutine
>
> --
> - - - - - - - - - - - - - - - - - - - - -
> Nathan Moore
> Assistant Professor, Physics
> Winona State University
> AIM: nmoorewsu
> - - - - - - - - - - - - - - - - - - - - -
>
--
- - - - - - - - - - - - - - - - - - - - -
Nathan Moore
Assistant Professor, Physics
Winona State University
AIM: nmoorewsu
- - - - - - - - - - - - - - - - - - - - -