Re: [julia-users] Re: Julia calling MKL functions from fortran shared library

2016-11-15 Thread


在 2016年11月16日星期三 UTC+8上午12:49:13,Ángel de Vicente写道:
>
> Hi, 
>
> I wanted to try this out. My test Fortran module is the following, which 
> I compile with: ifort -mkl -o test_mod.so test_mod.f90 -shared -fpic 
> , 
> | module testmodule 
> | 
> |   implicit none 
> | 
> |   double precision, external :: ddot 
> |   double precision, dimension(3) :: d,e 
> | 
> |   integer :: i 
> | 
> | contains 
> | 
> |   function dot_f() 
> | double precision :: dot_f 
> | 
> | do i = 1,3 
> |d(i) = 1.0d0*i 
> |e(i) = 3.5d0*i 
> | end do 
> | 
> | dot_f = ddot(3,d,1,e,1) 
> |   end function dot_f 
> | 
> | END module testmodule 
> ` 
>
> and my test.jl 
> , 
> | ccall((:testmodule_mp_dot_f_, "./test_mod"),Float64, ()) 
> ` 
>
> If I try to run test.jl I get, as per the OP: 
> , 
> | julia> include("test.jl")   
> 
> 
> | Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so. 
> ` 
>
>
> "Steven G. Johnson"  writes: 
> > I got 
> > Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so. 
> > 
> > Possibly you need to add the directory containing these files 
> > (/opt/intel/composer_xe_2015.0.090/mkl/lib or similar?) to your 
> LD_LIBRARY_PATH 
> > environment variable, so that the runtime linker knows where to find 
> them. 
>
> In this case this is not enough. 
>
> If you try to open the library directly you get: 
> , 
> | julia> 
> Libdl.dlopen("/opt/intel/2016.0.1/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64/libmkl_avx2.so")
>  
>
> | ERROR: could not load library 
> | 
> "/opt/intel/2016.0.1/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64/libmkl_avx2.so"
>  
> 
> | 
> /opt/intel/2016.0.1/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64/libmkl_avx2.so:
>  
>
> | undefined symbol: mkl_dft_fft_fix_twiddle_table_32f 
>   
> |  in dlopen(::String, ::UInt32) at ./libdl.jl:90 (repeats 2 times) 
> ` 
>
> and nm confirms that those symbols are undefined: 
> , 
> | [angelv@duna intel64]$ nm libmkl_avx2.so | grep fft_fix 
> |  U mkl_dft_fft_fix_twiddle_table_32f 
> |  U mkl_dft_fft_fix_twiddle_table_64f 
> ` 
>
> and they are acutally defined in libmkl_core.so 
> , 
> | [angelv@duna intel64]$ nm libmkl_core.so | grep fft_fix 
> | 018e3020 D mkl_dft_fft_fix_twiddle_table_32f 
> | 018e2800 D mkl_dft_fft_fix_twiddle_table_64f 
> | [angelv@duna intel64]$ 
> ` 
>
>
> So, a workaround is to open libmkl_core.so first with the flag 
> RTLD_GLOBAL and then run the test.jl code: 
>
> , 
> | julia> 
> | 
> Libdl.dlopen("/opt/intel/2016.0.1/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64/libmkl_core.so",Libdl.RTLD_GLOBAL)
>  
>   
> | Ptr{Void} @0x03af6fa0   
> 
> 
> | 
> 
> 
> | julia> include("test.jl")   
> 
> 
> | 49.0 
> ` 
>
> But, to be honest, I don't fully understand if this will be enough for 
> all codes using MKL or perhaps other dependencies are there which forces 
> you to open more libraries manually with RTLD_GLOBAL. But at least it 
> points in the right direction. 
>
> Cheers, 
> -- 
> Ángel de Vicente 
> http://www.iac.es/galeria/angelv/   
>


You have solved the problem, that's nice. But maybe I should reconsider 
whether to use mkl in such a uncomfortable way. 


[julia-users] Julia calling MKL functions from fortran shared library

2016-11-15 Thread
These days, I tried a walkaround to implement FFT in julia with the help of 
the MKL fft properties. 

I wrote a fortran module with several subroutines, in some of which the MKL 
FFT function was called. The fortran file was compiled to a shared library.

   ifort modules.f90  -parallel -mkl -lm -O3 
-I/opt/intel/composer_xe_2015.0.090/mkl/include -   
 L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64   -shared -fpic -o 
new.so

the test.jl reads:
  ccall((:mymodule_mp_ground_, "new"), Int,())

when I ran the julia code with:
  julia test.jl
I got 
  Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.

What might go wrong? Do I need to give the whole fortran module code for 
analysis?



Re: [julia-users] shared library created by ifort can't be imported correctly

2016-11-15 Thread


在 2016年11月15日星期二 UTC+8下午4:29:17,Ángel de Vicente写道:
>
> Hi, 
>
> 博陈 <chenph...@gmail.com > writes: 
> > The julia code: 
> > 
> > !f90tojl.f90 
> > module m 
> > contains 
> > integer function five() 
> > five = 5 
> > end function five 
> > end module m 
> > 
> > The corresponding julia code: 
> > #test.jl 
> > println(ccall( (:__m_MOD_five, "f90tojl"), Int, () )) 
> > 
> > The test command and the result: (test is the directory, not a command) 
> > ➜ test gfortran -shared -fpic f90tojl.f90 -o f90tojl.so 
> > ➜ test julia test.jl 
> > 5 
> > ➜ test ifort -shared -fpic f90tojl.f90 -o f90tojl.so 
> > ➜ test julia test.jl 
> > ERROR: LoadError: ccall: could not find function __m_MOD_five in library 
> f90tojl 
> > in anonymous at no file 
> > in include at ./boot.jl:261 
> > in include_from_node1 at ./loading.jl:320 
> > in process_options at ./client.jl:280 
> > in _start at ./client.jl:378 
> > while loading /home/chen/test/test.jl, in expression starting on line 
> > 1 
>
> You can check list of symbols in the created library with the command 
> nm. With gfortran your "five" function will be called __m_MOD_five as in 
> your Julia call, buth with Ifort you will have to use m_mp_five_ 
>
> , 
> | angelv@carro:~/temp$ nm testgfort.so | grep five 
> | 0635 T __m_MOD_five 
> | 
> | angelv@carro:~/temp$ nm testifort.so | grep five 
> | 0740 T m_mp_five_ 
> ` 
>
> Cheers, 
> -- 
> Ángel de Vicente 
> http://www.iac.es/galeria/angelv/   
>



Thank you very much. The "nm" is a very powerful tool! 


[julia-users] shared library created by ifort can't be imported correctly

2016-11-14 Thread
The julia code:

!f90tojl.f90
module m
contains
integer function five()
   five = 5
end function five
end module m

The corresponding julia code:
#test.jl
println(ccall( (:__m_MOD_five, "f90tojl"), Int, () ))

The test command and the result: (test is the directory, not a command)
➜  test gfortran -shared -fpic f90tojl.f90 -o f90tojl.so
➜  test julia test.jl   
5
➜  test ifort -shared -fpic f90tojl.f90 -o f90tojl.so
➜  test julia test.jl
ERROR: LoadError: ccall: could not find function __m_MOD_five in library 
f90tojl
 in anonymous at no file
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
 in process_options at ./client.jl:280
 in _start at ./client.jl:378
while loading /home/chen/test/test.jl, in expression starting on line 1

 



[julia-users] how can i rewrite my code to make use of the parallel feature of julia?

2016-06-23 Thread
Below is part of my code, I wrote it with FFTW.set_num_threads() to get 
parallel fft calculation. However, the for loops "for eachindex" are just 
serial. Is it possible to rewrite the code in a parallel way? I don't know 
how to implement it with @parallel, SharedArray or the simple @spawn 
features. To me, the documentations is far from clear.


using JLD

function ground(ϕ0::Array{Complex{Float64},2}, dx::Float64
, dt::Float64)

FFTW.set_num_threads(CPU_CORES)

ns = size( ϕ0, 1)
x = get_x(ns, dx)
p = get_p(ns, dx)

# FFT plan
p_fft! = plan_fft!( similar(ϕ0), flags=FFTW.MEASURE )

prop_x = Array( Float64, ns , ns)
prop_p = similar( prop_x )

@inbounds for j in 1:ns
@inbounds for i in 1:ns
prop_x[i, j] = exp( -get_potential(x[i], x[j]) * dt / 2 )
prop_p[i, j] = exp( -(p[i]^2 + p[j]^2)/2.0 * dt )
end
end

normϕ = √(sumabs2(ϕ0)) * dx
scale!( ϕ0, 1 / normϕ )

ϕ2 = similar( ϕ0 )
Δϕ = Array(Float64, 0)
push!(Δϕ, 1.0)
nn = 0
while Δϕ[1] > 1.e-15
for j in eachindex(ϕ2)
ϕ2[j] = ϕ0[j] * prop_x[j]
end

p_fft! * ϕ2
for j in eachindex(ϕ2)
ϕ2[j] *= prop_p[j]
end

p_fft! \ ϕ2
for j in eachindex(ϕ2)
ϕ2[j] *= prop_x[j]
end

normϕ = √(sumabs2(ϕ2)) * dx
scale!( ϕ2, 1 / normϕ )

nn += 1

empty!(Δϕ)
push!(Δϕ, maxabs( ϕ2 - ϕ0 ))
ϕ0, ϕ2 = ϕ2, ϕ0
end

save("data/gs.jld", "ϕ", ϕ0)

ϕ0
end




[julia-users] how can i rewrite the code to make use of the parallel feature in julia?

2016-06-23 Thread
This is my code, I wrote it with 

using ..Utils
using JLD

function ground(ϕ0::Array{Complex{Float64},2}, dx::Float64
, dt::Float64)

FFTW.set_num_threads(CPU_CORES)

ns = size( ϕ0, 1)
x = get_x(ns, dx)
p = get_p(ns, dx)

# FFT plan
p_fft! = plan_fft!( similar(ϕ0), flags=FFTW.MEASURE )

prop_x = Array( Float64, ns , ns)
prop_p = similar( prop_x )

@inbounds for j in 1:ns
@inbounds for i in 1:ns
prop_x[i, j] = exp( -get_potential(x[i], x[j]) * dt / 2 )
prop_p[i, j] = exp( -(p[i]^2 + p[j]^2)/2.0 * dt )
end
end

normϕ = √(sumabs2(ϕ0)) * dx
scale!( ϕ0, 1 / normϕ )

ϕ2 = similar( ϕ0 )
Δϕ = Array(Float64, 0)
push!(Δϕ, 1.0)
nn = 0
while Δϕ[1] > 1.e-15
for j in eachindex(ϕ2)
ϕ2[j] = ϕ0[j] * prop_x[j]
end

p_fft! * ϕ2
for j in eachindex(ϕ2)
ϕ2[j] *= prop_p[j]
end

p_fft! \ ϕ2
for j in eachindex(ϕ2)
ϕ2[j] *= prop_x[j]
end

normϕ = √(sumabs2(ϕ2)) * dx
scale!( ϕ2, 1 / normϕ )

nn += 1

empty!(Δϕ)
push!(Δϕ, maxabs( ϕ2 - ϕ0 ))
ϕ0, ϕ2 = ϕ2, ϕ0
end

save("data/gs.jld", "ϕ", ϕ0)

ϕ0
end



Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-30 Thread
I tried @code_warntype, and the result show that the red alert appear only 
in the io part. Maybe it's not a type-stability issue.

在 2016年3月30日星期三 UTC+8上午3:18:24,Tim Holy写道:
>
> I haven't look at this myself, but have you tried Stefan's suggestion to 
> look 
> at `@code_warntype`? This might not be a preallocation issue, it might be 
> a 
> type-stability issue. 
>
> --Tim 
>
> On Tuesday, March 29, 2016 01:56:21 PM Yichao Yu wrote: 
> > On Tue, Mar 29, 2016 at 1:50 PM, 博陈 <chenph...@gmail.com > 
> wrote: 
> > > Additionally, the allocation problem is not solved. I guess this 
> > > 
> http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-tempo 
> > > rary-arrays-being-created-in-a-function-td14492.html might be helpful, 
> but 
> > > I just don't know how to change my code. 
> > 
> > The only places you create temporary arrays according to your profile is 
> > the `sum(A[1:n])` and you just need to loop from 1:n manually instead of 
> > creating an subarray 
> > 
> > > 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道: 
> > > 
> > >> On Tue, Mar 29, 2016 at 12:43 PM, 博陈 <chenph...@gmail.com> wrote: 
> > >>> I tried the built-in profiler, and find that the problem lies in 
> lines I 
> > >>> end  with **, the result is shown below: 
> > >>> that proved my guess, how can I pre-allocate these arrays? If I 
> don't 
> > >>> want to divide this code into several parts that calculate these 
> arrays 
> > >>> separately. 
> > >> 
> > >> I don't understand what you mean by `divide this code into several 
> parts 
> > >> that calculate these arrays separately` 
> > >> 
> > >>> | lines | backtrace | 
> > >>> | 
> > >>> |   169 |  9011 |  *** 
> > >>> |   
> > >>> |   173 |  1552 | 
> > >>> |   
> > >>> |   175 |  2604 | 
> > >>> |   
> > >>> |   179 |  2906 | 
> > >>> |   
> > >>> |   181 |  1541 | 
> > >>> |   
> > >>> |   192 |  4458 | 
> > >>> |   
> > >>> |   211 | 13332 | 
> > >>> |   
> > >>> |   214 |  8431 | 
> > >>> |   
> > >>> |   218 | 15871 |*** 
> > >>> |   
> > >>> |   221 |  2538 | 
> > >>> 
> > >>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道: 
> > >>> 
> > >>>> Have you tried: 
> > >>>> 
> > >>>> (a) calling @code_typewarn on your function 
> > >>>> (b) using the built-in profiler? 
> > >>>> 
> > >>>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈 <chenph...@gmail.com> wrote: 
> > >>>>> First of all, have a look at the result. 
> > >>>>> 
> > >>>>> 
> > >>>>> <
> https://lh3.googleusercontent.com/-anNt-E4P1vM/Vvp-TybegZI/AB 
> > >>>>> 
> E/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589% 
> > >>>>> 258720160329210732.png> 
> > >>>>> 
> > >>>>> 
> > >>>>> 
> > >>>>> 
> > >>>>> 
> > >>>>> 
> > >>>>> 
> > >>>>> 
> > >>>>> My code calculates the evolution of 1-d 2-electron system in the 
> > >>>>> electric field, some variables are calculated during the 
> evolution. 
> > >>>>> According to the result of @time evolution, my code must have a 
> > >>>>> pre-allocation problem. Before you see the long code, i suggest 
> that 
> > >>>>> the 
> > >>>>> hotspot might be around the Arrays prop_e, \phio, pp. I have 
> learnt 
> > >>>>> that I 
> > >>>>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) 
> and 
> > >>>>> push!(m, new_m) inside the loop to pre-allocate the variable m, 
> but in 
> > >>>>> my 
> > >>>>> situations, I don't know how to pre-allocate these arrays. 
> > >>>>> 
> > >>>>> Below is the script (precisely, the main function) itself. 
> > >>>>> 
> > >>>>> function evolu

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread
I rewrote my code and manually loop from 1:n, the pre-allocation problem is 
solved. I also added some parenthesis as you suggested, that helps, but not 
very much. 

在 2016年3月30日星期三 UTC+8上午1:56:47,Yichao Yu写道:
>
>
>
> On Tue, Mar 29, 2016 at 1:50 PM, 博陈 <chenph...@gmail.com > 
> wrote:
>
>> Additionally, the allocation problem is not solved. I guess this 
>> http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-temporary-arrays-being-created-in-a-function-td14492.html
>>  might 
>> be helpful, but I just don't know how to change my code.
>>
>>
> The only places you create temporary arrays according to your profile is 
> the `sum(A[1:n])` and you just need to loop from 1:n manually instead of 
> creating an subarray
>  
>
>>
>>
>> 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>>
>>>
>>>
>>> On Tue, Mar 29, 2016 at 12:43 PM, 博陈 <chenph...@gmail.com> wrote:
>>>
>>>> I tried the built-in profiler, and find that the problem lies in lines 
>>>> I end  with **, the result is shown below:
>>>> that proved my guess, how can I pre-allocate these arrays? If I don't 
>>>> want to divide this code into several parts that calculate these arrays 
>>>> separately. 
>>>>
>>>
>>> I don't understand what you mean by `divide this code into several parts 
>>> that calculate these arrays separately`
>>>  
>>>
>>>> | lines | backtrace |
>>>>
>>>> |   169 |  9011 |  ***
>>>>
>>>> |   173 |  1552 |
>>>>
>>>> |   175 |  2604 |
>>>>
>>>> |   179 |  2906 |
>>>>
>>>> |   181 |  1541 |
>>>>
>>>> |   192 |      4458 |
>>>>
>>>> |   211 | 13332 |
>>>>
>>>> |   214 |  8431 |
>>>>
>>>> |   218 | 15871 |***
>>>>
>>>> |   221 |  2538 |
>>>>
>>>>
>>>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>>>>>
>>>>> Have you tried:
>>>>>
>>>>> (a) calling @code_typewarn on your function
>>>>> (b) using the built-in profiler?
>>>>>
>>>>>
>>>>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈 <chenph...@gmail.com> wrote:
>>>>>
>>>>>> First of all, have a look at the result.
>>>>>>
>>>>>>
>>>>>> <https://lh3.googleusercontent.com/-anNt-E4P1vM/Vvp-TybegZI/ABE/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589%258720160329210732.png>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> My code calculates the evolution of 1-d 2-electron system in the 
>>>>>> electric field, some variables are calculated during the evolution.
>>>>>> According to the result of @time evolution, my code must have a 
>>>>>> pre-allocation problem. Before you see the long code, i suggest that the 
>>>>>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that 
>>>>>> I 
>>>>>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
>>>>>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in 
>>>>>> my 
>>>>>> situations, I don't know how to pre-allocate these arrays.
>>>>>>
>>>>>> Below is the script (precisely, the main function) itself.
>>>>>>
>>>>>> function evolution(ϕ::Array{Complex{Float64}, 2},
>>>>>>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>>>>>>flags::Tuple{Int64, Int64, Int64, Int64})
>>>>>> ϕg = copy(ϕ)
>>>>>> FFTW.set_num_threads(8)
>>>>>> ns = length( ϕ[:, 1] )
>>>>>> x = get_x(ns, dx)
>>>>>> p = get_p(ns, dx)
>>>>>> if flags[4] == 1
>>>>>> pp = similar(p)
>>>>>> A = -cumsum(ele) * dt
>>>>>> A² = A.*A
>>>>>> # splitting
>>>>>> r_sp = 150.0
>>>>>> δ_sp = 5.0
>>>>

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread
Additionally, the allocation problem is not solved. I guess this 
http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-temporary-arrays-being-created-in-a-function-td14492.html
 might 
be helpful, but I just don't know how to change my code.



在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>
>
>
> On Tue, Mar 29, 2016 at 12:43 PM, 博陈 <chenph...@gmail.com > 
> wrote:
>
>> I tried the built-in profiler, and find that the problem lies in lines I 
>> end  with **, the result is shown below:
>> that proved my guess, how can I pre-allocate these arrays? If I don't 
>> want to divide this code into several parts that calculate these arrays 
>> separately. 
>>
>
> I don't understand what you mean by `divide this code into several parts 
> that calculate these arrays separately`
>  
>
>> | lines | backtrace |
>>
>> |   169 |  9011 |  ***
>>
>> |   173 |  1552 |
>>
>> |   175 |  2604 |
>>
>> |   179 |  2906 |
>>
>> |   181 |  1541 |
>>
>> |   192 |  4458 |
>>
>> |   211 | 13332 |
>>
>> |   214 |  8431 |
>>
>> |   218 | 15871 |***
>>
>> |   221 |  2538 |
>>
>>
>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>>>
>>> Have you tried:
>>>
>>> (a) calling @code_typewarn on your function
>>> (b) using the built-in profiler?
>>>
>>>
>>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈 <chenph...@gmail.com> wrote:
>>>
>>>> First of all, have a look at the result.
>>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-anNt-E4P1vM/Vvp-TybegZI/ABE/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589%258720160329210732.png>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> My code calculates the evolution of 1-d 2-electron system in the 
>>>> electric field, some variables are calculated during the evolution.
>>>> According to the result of @time evolution, my code must have a 
>>>> pre-allocation problem. Before you see the long code, i suggest that the 
>>>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
>>>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
>>>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
>>>> situations, I don't know how to pre-allocate these arrays.
>>>>
>>>> Below is the script (precisely, the main function) itself.
>>>>
>>>> function evolution(ϕ::Array{Complex{Float64}, 2},
>>>>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>>>>flags::Tuple{Int64, Int64, Int64, Int64})
>>>> ϕg = copy(ϕ)
>>>> FFTW.set_num_threads(8)
>>>> ns = length( ϕ[:, 1] )
>>>> x = get_x(ns, dx)
>>>> p = get_p(ns, dx)
>>>> if flags[4] == 1
>>>> pp = similar(p)
>>>> A = -cumsum(ele) * dt
>>>> A² = A.*A
>>>> # splitting
>>>> r_sp = 150.0
>>>> δ_sp = 5.0
>>>> splitter = Array(Float64, ns, ns)
>>>> end
>>>> nt = length( ele )
>>>>
>>>> # # Pre-allocate result and temporary arrays
>>>> #if flags[1] == 1
>>>> σ = zeros(Complex128, nt)
>>>> #end
>>>> #if flags[2] == 1
>>>> a = zeros(Float64, nt)
>>>> #end
>>>> #if flags[3] == 1
>>>> r_ionization = 20.0
>>>> n1 = round(Int, ns/2 - r_ionization/dx)
>>>> n2 = round(Int, ns/2 + r_ionization/dx)
>>>> ip = zeros(Float64, nt)
>>>> #end
>>>>
>>>> # FFT plan
>>>> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>>>>
>>>> prop_x = similar( ϕ )
>>>> prop_p = similar( prop_x )
>>>> prop_e = similar( prop_x )
>>>> # this two versions just cost the same time
>>>> xplusy = Array(Float64, ns, ns)
>>>> #xplusy = Array( Float64, ns^2)
>>>>
>>>> # absorb boundary
>>>> r_a = ns * dx /2 - 50.0
>>>> δ = 10.0
>>>> absorb = Array(Float64, ns, ns)

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread
Thank you very much for your help. 

But I don't understand what the difference is between  `a * b * c * d` and 
`(a * b) * (c * d)`. Do you mean that the latter is slower?

Before I wrote this function, I just wrote a much simple one, which only 
calculate array a (physically, the dipole acceleration). At that time, I 
only need to know the dipole acceleration during the evolution. Then I 
found 
I need other information during evolution, so I wrote another function to 
calculate ip, the ionization probability... Same story with \sigma and 
\phip, Now I wrote them together, and give a flags parameter. If flags = 
(1, 1, 1, 1), the script calculate all the 4 quantities, while if 
flags=(0,0,0,0), the function just give me the final wave function.

在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>
>
>
> On Tue, Mar 29, 2016 at 12:43 PM, 博陈 <chenph...@gmail.com > 
> wrote:
>
>> I tried the built-in profiler, and find that the problem lies in lines I 
>> end  with **, the result is shown below:
>> that proved my guess, how can I pre-allocate these arrays? If I don't 
>> want to divide this code into several parts that calculate these arrays 
>> separately. 
>>
>
> I don't understand what you mean by `divide this code into several parts 
> that calculate these arrays separately`
>  
>
>> | lines | backtrace |
>>
>> |   169 |  9011 |  ***
>>
>> |   173 |  1552 |
>>
>> |   175 |  2604 |
>>
>> |   179 |  2906 |
>>
>> |   181 |  1541 |
>>
>> |   192 |  4458 |
>>
>> |   211 | 13332 |
>>
>> |   214 |  8431 |
>>
>> |   218 | 15871 |***
>>
>> |   221 |  2538 |
>>
>>
>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>>>
>>> Have you tried:
>>>
>>> (a) calling @code_typewarn on your function
>>> (b) using the built-in profiler?
>>>
>>>
>>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈 <chenph...@gmail.com> wrote:
>>>
>>>> First of all, have a look at the result.
>>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-anNt-E4P1vM/Vvp-TybegZI/ABE/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589%258720160329210732.png>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> My code calculates the evolution of 1-d 2-electron system in the 
>>>> electric field, some variables are calculated during the evolution.
>>>> According to the result of @time evolution, my code must have a 
>>>> pre-allocation problem. Before you see the long code, i suggest that the 
>>>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
>>>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
>>>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
>>>> situations, I don't know how to pre-allocate these arrays.
>>>>
>>>> Below is the script (precisely, the main function) itself.
>>>>
>>>> function evolution(ϕ::Array{Complex{Float64}, 2},
>>>>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>>>>flags::Tuple{Int64, Int64, Int64, Int64})
>>>> ϕg = copy(ϕ)
>>>> FFTW.set_num_threads(8)
>>>> ns = length( ϕ[:, 1] )
>>>> x = get_x(ns, dx)
>>>> p = get_p(ns, dx)
>>>> if flags[4] == 1
>>>> pp = similar(p)
>>>> A = -cumsum(ele) * dt
>>>> A² = A.*A
>>>> # splitting
>>>> r_sp = 150.0
>>>> δ_sp = 5.0
>>>> splitter = Array(Float64, ns, ns)
>>>> end
>>>> nt = length( ele )
>>>>
>>>> # # Pre-allocate result and temporary arrays
>>>> #if flags[1] == 1
>>>> σ = zeros(Complex128, nt)
>>>> #end
>>>> #if flags[2] == 1
>>>> a = zeros(Float64, nt)
>>>> #end
>>>> #if flags[3] == 1
>>>> r_ionization = 20.0
>>>> n1 = round(Int, ns/2 - r_ionization/dx)
>>>> n2 = round(Int, ns/2 + r_ionization/dx)
>>>> ip = zeros(Float64, nt)
>>>> #end
>>>>
>>>> # FFT plan
>>>> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>>>>
&

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread
sorry, but I give the lines in the citing area below the table.

在 2016年3月30日星期三 UTC+8上午12:50:30,Milan Bouchet-Valat写道:
>
> Le mardi 29 mars 2016 à 09:43 -0700, 博陈 a écrit : 
> > I tried the built-in profiler, and find that the problem lies in 
> > lines I end  with **, the result is shown below: 
> > that proved my guess, how can I pre-allocate these arrays? If I don't 
> > want to divide this code into several parts that calculate these 
> > arrays separately.  
> Can you show us which lines the numbers correspond to? There's no line 
> 211 in the script you posted. 
>
>
> Regards 
>
> > | lines | backtrace | 
> > |   169 |  9011 |  *** 
> > |   173 |  1552 | 
> > |   175 |  2604 | 
> > |   179 |  2906 | 
> > |   181 |  1541 | 
> > |   192 |  4458 | 
> > |   211 | 13332 | 
> > |   214 |  8431 | 
> > |   218 | 15871 |*** 
> > |   221 |  2538 | 
> > 
> > > Have you tried: 
> > > 
> > > (a) calling @code_typewarn on your function 
> > > (b) using the built-in profiler? 
> > > 
> > > 
> > > On Tue, Mar 29, 2016 at 9:23 AM, 博陈 <chenph...@gmail.com> wrote: 
> > > > First of all, have a look at the result. 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > My code calculates the evolution of 1-d 2-electron system in the 
> electric field, some variables are calculated during the evolution. 
> > > > According to the result of @time evolution, my code must have a 
> pre-allocation problem. Before you see the long code, i suggest that the 
> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
> situations, I don't know how to pre-allocate these arrays. 
> > > > 
> > > > Below is the script (precisely, the main function) itself. 
> > > > 
> > > > function evolution(ϕ::Array{Complex{Float64}, 2}, 
> > > >ele::Array{Float64, 1}, dx::Float64, dt::Float64, 
> > > >flags::Tuple{Int64, Int64, Int64, Int64}) 
> > > > ϕg = copy(ϕ) 
> > > > FFTW.set_num_threads(8) 
> > > > ns = length( ϕ[:, 1] ) 
> > > > x = get_x(ns, dx) 
> > > > p = get_p(ns, dx) 
> > > > if flags[4] == 1 
> > > > pp = similar(p) 
> > > > A = -cumsum(ele) * dt 
> > > > A² = A.*A 
> > > > # splitting 
> > > > r_sp = 150.0 
> > > > δ_sp = 5.0 
> > > > splitter = Array(Float64, ns, ns) 
> > > > end 
> > > > nt = length( ele ) 
> > > > 
> > > > # # Pre-allocate result and temporary arrays 
> > > > #if flags[1] == 1 
> > > > σ = zeros(Complex128, nt) 
> > > > #end 
> > > > #if flags[2] == 1 
> > > > a = zeros(Float64, nt) 
> > > > #end 
> > > > #if flags[3] == 1 
> > > > r_ionization = 20.0 
> > > > n1 = round(Int, ns/2 - r_ionization/dx) 
> > > > n2 = round(Int, ns/2 + r_ionization/dx) 
> > > > ip = zeros(Float64, nt) 
> > > > #end 
> > > > 
> > > > # FFT plan 
> > > > p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE ) 
> > > > 
> > > > prop_x = similar( ϕ ) 
> > > > prop_p = similar( prop_x ) 
> > > > prop_e = similar( prop_x ) 
> > > > # this two versions just cost the same time 
> > > > xplusy = Array(Float64, ns, ns) 
> > > > #xplusy = Array( Float64, ns^2) 
> > > > 
> > > > # absorb boundary 
> > > > r_a = ns * dx /2 - 50.0 
> > > > δ = 10.0 
> > > > absorb = Array(Float64, ns, ns) 
> > > > 
> > > > k0 = 2π / (ns * dx) 
> > > > 
> > > > @inbounds for j in 1:ns 
> > > > @inbounds for i in 1:ns 
> > > > prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt 
> / 2 ) 
> > > > prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread
I tried the built-in profiler, and find that the problem lies in lines I 
end  with **, the result is shown below:
that proved my guess, how can I pre-allocate these arrays? If I don't want 
to divide this code into several parts that calculate these arrays 
separately. 

| lines | backtrace |

|   169 |  9011 |  ***

|   173 |  1552 |

|   175 |  2604 |

|   179 |  2906 |

|   181 |  1541 |

|   192 |  4458 |

|   211 | 13332 |

|   214 |  8431 |

|   218 | 15871 |***

|   221 |  2538 |


在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>
> Have you tried:
>
> (a) calling @code_typewarn on your function
> (b) using the built-in profiler?
>
>
> On Tue, Mar 29, 2016 at 9:23 AM, 博陈 <chenph...@gmail.com > 
> wrote:
>
>> First of all, have a look at the result.
>>
>>
>> <https://lh3.googleusercontent.com/-anNt-E4P1vM/Vvp-TybegZI/ABE/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589%258720160329210732.png>
>>
>>
>>
>>
>>
>>
>>
>>
>> My code calculates the evolution of 1-d 2-electron system in the electric 
>> field, some variables are calculated during the evolution.
>> According to the result of @time evolution, my code must have a 
>> pre-allocation problem. Before you see the long code, i suggest that the 
>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
>> situations, I don't know how to pre-allocate these arrays.
>>
>> Below is the script (precisely, the main function) itself.
>>
>> function evolution(ϕ::Array{Complex{Float64}, 2},
>>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>>flags::Tuple{Int64, Int64, Int64, Int64})
>> ϕg = copy(ϕ)
>> FFTW.set_num_threads(8)
>> ns = length( ϕ[:, 1] )
>> x = get_x(ns, dx)
>> p = get_p(ns, dx)
>> if flags[4] == 1
>> pp = similar(p)
>> A = -cumsum(ele) * dt
>> A² = A.*A
>> # splitting
>> r_sp = 150.0
>> δ_sp = 5.0
>> splitter = Array(Float64, ns, ns)
>> end
>> nt = length( ele )
>>
>> # # Pre-allocate result and temporary arrays
>> #if flags[1] == 1
>> σ = zeros(Complex128, nt)
>> #end
>> #if flags[2] == 1
>> a = zeros(Float64, nt)
>> #end
>> #if flags[3] == 1
>> r_ionization = 20.0
>> n1 = round(Int, ns/2 - r_ionization/dx)
>> n2 = round(Int, ns/2 + r_ionization/dx)
>> ip = zeros(Float64, nt)
>> #end
>>
>> # FFT plan
>> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>>
>> prop_x = similar( ϕ )
>> prop_p = similar( prop_x )
>> prop_e = similar( prop_x )
>> # this two versions just cost the same time
>> xplusy = Array(Float64, ns, ns)
>> #xplusy = Array( Float64, ns^2)
>>
>> # absorb boundary
>> r_a = ns * dx /2 - 50.0
>> δ = 10.0
>> absorb = Array(Float64, ns, ns)
>>
>> k0 = 2π / (ns * dx)
>>
>> @inbounds for j in 1:ns
>> @inbounds for i in 1:ns
>> prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt / 2 )
>> prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
>>
>> xplusy[i, j] = x[i] + x[j]
>>
>> absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 
>> get_out(x[j],
>>  r_a, δ))
>> end
>> end
>>
>> if flags[2] == 1
>> pvpx = Array(Float64, ns, ns)
>> @inbounds for j in 1:ns
>> @inbounds for i in 1:ns
>> pvpx[i, j] = get_pvpx(x[i], x[j])
>> end
>> end
>> end
>>
>> if flags[4] == 1
>> ϕo = zeros(Complex128, ns, ns)
>> ϕp = zeros(Complex128, ns, ns)
>> @inbounds for  j in 1:ns
>> @inbounds for  i in 1:ns
>> splitter[i, j] = get_out(x[i], r_sp, δ_sp) * 
>> get_out(x[j], r_sp, δ_sp)
>> end
>> end
>> end
>>
>> for i in 1:nt
>> for j in eachindex(ϕ)
>> prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0) 
>> 169
>> end

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread
Actually, I know the arrays allocated in every loop, my problem is that I 
don't know the strategy to pre-allocate such arrays. 
In short, this is the pre-allocating problem of arrays like array a 
described below:


n = 1024; nt = 1000;
dt = 0.1
a = Array(Float64, n, n)
for i in 1:nt
t = i*dt
for j in eachindex(a)
 a[j] = exp(pi * t) * a0
end
...  scripts to use a[j] for a single i
end

In my problem, such arrays are hard to pre-allocate, do you have any 
suggestions?

在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>
> Have you tried:
>
> (a) calling @code_typewarn on your function
> (b) using the built-in profiler?
>
>
>

[julia-users] Re: hdf5 not installing on linux cluster node

2016-03-29 Thread
I have experienced by the same problem last two months, and finally figured 
out the solution.
When you log in not as a root user, you can't install libraries like hdf5 
or c-blosc with apt-get or yum. To install these libs, you should download 
them from  https://www.hdfgroup.org/downloads/index.html and 
https://github.com/Blosc/c-blosc, and then put them to the cluster server, 
conventionally do " ./configure &  make & make 
install " procedure, and add the LD_LIBRARY_PATH in .bashrc. 

When you are beyond the firewall, you might have to check whether  you have 
access to github, maybe you need http: instead of https: as the address 
prefix.


在 2016年3月28日星期一 UTC+8上午10:42:45,jda写道:
>
> I am getting this error:
>
> LoadError: None of the selected providers can install dependency libhdf5.
>
> on a linux cluster node.  Can anyone suggest a fix?
>


[julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread
First of all, have a look at the result.










My code calculates the evolution of 1-d 2-electron system in the electric 
field, some variables are calculated during the evolution.
According to the result of @time evolution, my code must have a 
pre-allocation problem. Before you see the long code, i suggest that the 
hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
situations, I don't know how to pre-allocate these arrays.

Below is the script (precisely, the main function) itself.

function evolution(ϕ::Array{Complex{Float64}, 2},
   ele::Array{Float64, 1}, dx::Float64, dt::Float64,
   flags::Tuple{Int64, Int64, Int64, Int64})
ϕg = copy(ϕ)
FFTW.set_num_threads(8)
ns = length( ϕ[:, 1] )
x = get_x(ns, dx)
p = get_p(ns, dx)
if flags[4] == 1
pp = similar(p)
A = -cumsum(ele) * dt
A² = A.*A
# splitting
r_sp = 150.0
δ_sp = 5.0
splitter = Array(Float64, ns, ns)
end
nt = length( ele )

# # Pre-allocate result and temporary arrays
#if flags[1] == 1
σ = zeros(Complex128, nt)
#end
#if flags[2] == 1
a = zeros(Float64, nt)
#end
#if flags[3] == 1
r_ionization = 20.0
n1 = round(Int, ns/2 - r_ionization/dx)
n2 = round(Int, ns/2 + r_ionization/dx)
ip = zeros(Float64, nt)
#end

# FFT plan
p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )

prop_x = similar( ϕ )
prop_p = similar( prop_x )
prop_e = similar( prop_x )
# this two versions just cost the same time
xplusy = Array(Float64, ns, ns)
#xplusy = Array( Float64, ns^2)

# absorb boundary
r_a = ns * dx /2 - 50.0
δ = 10.0
absorb = Array(Float64, ns, ns)

k0 = 2π / (ns * dx)

@inbounds for j in 1:ns
@inbounds for i in 1:ns
prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt / 2 )
prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )

xplusy[i, j] = x[i] + x[j]

absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 
get_out(x[j],
 r_a, δ))
end
end

if flags[2] == 1
pvpx = Array(Float64, ns, ns)
@inbounds for j in 1:ns
@inbounds for i in 1:ns
pvpx[i, j] = get_pvpx(x[i], x[j])
end
end
end

if flags[4] == 1
ϕo = zeros(Complex128, ns, ns)
ϕp = zeros(Complex128, ns, ns)
@inbounds for  j in 1:ns
@inbounds for  i in 1:ns
splitter[i, j] = get_out(x[i], r_sp, δ_sp) * get_out(x[j], 
r_sp, δ_sp)
end
end
end

for i in 1:nt
for j in eachindex(ϕ)
prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0)
end

for j in eachindex(ϕ)
ϕ[j] *= prop_x[j] * prop_e[j]
end
p_fft! * ϕ
for j in eachindex(ϕ)
ϕ[j] *= prop_p[j]
end
p_fft! \ ϕ
for j in eachindex(ϕ)
ϕ[j] *= prop_x[j] * prop_e[j]
end
## autocorrelation function σ(t)
if flags[1] == 1
for j in eachindex(ϕ)
σ[i] += conj(ϕg[j]) * ϕ[j]
end
end
## dipole acceleration a(t)
if flags[2] == 1
for j in eachindex(ϕ)
a[i] += abs(ϕ[j])^2 * (pvpx[j] + 2ele[i])
end
end
## ionization probability ip(t)
if flags[3] == 1
for j1 in n1:n2
for j2 in 1:ns
ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
end
end
for j1 in [1:n1-1; n2+1:ns]
for j2 in n1:n2
ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
end
end
end
## get momentum
if flags[4] == 1
for j in eachindex(ϕo)
ϕo[j] = ϕ[j] * splitter[j] * exp( -im * A[i]*xplusy[j] )
end
for j in eachindex(p)
pp[j] = p[j]^2 /2 * (nt-i) - p[j] *sum( A[i:nt] ) + sum( 
A²[1:nt] ) /2
end
for j2 in 1:ns
for j1 in 1:ns
ϕo[j1, j2] = ϕo[j1, j2] * exp( -im * (pp[j1] + pp[j2]) 
* dt)
end
end
p_fft! * ϕo
for j in eachindex(ϕp)
ϕp[j] += ϕo[j]
end
end

## absorb boundary
if mod(i, 300) == 0
for j in eachindex(ϕ)
ϕ[j] *= absorb[j]
end
end

if (mod(i, 500) == 0)
println("i = $i")
   

[julia-users] Why can't julia make the most of the system resource?

2015-12-19 Thread



As can be seen, all the 8 cores of my pc is being used by the julia 
program, however, only 35% of the system resource is covered. My julia code 
mainly does the fft operates in a loop. I use fftw.set_num_threads(8) 
outside the loop.  In my opinion, the top command should exhibit nearly 
100% use of the system resource, but that's not true. 


Re: [julia-users] Re: A Julia code slower than Matlab, can it be faster?

2015-12-16 Thread
Before seeing your scripts, I could not imagine the application of Julia on 
my research. Now I have forked your repository in github, and followed you. 
 I am interested in your research, and caught a glimpse of one paper by Ni 
group. Althoug it's far from the focus of my research, I believe that your 
repositories will make a great difference to my explore on Juila.



在 2015年12月15日星期二 UTC+8下午10:11:19,Yichao Yu写道:
>
> On Tue, Dec 15, 2015 at 7:42 AM, 博陈 <chenph...@gmail.com > 
> wrote: 
> > The code get great performance improvement. My original version cost 133 
> s 
> > and about 166 G memories, while your version cost 61 s and about 423M 
> > memories! 
> > 
> > 
>
> My version is here[1] (ignore the unused first 12 lines, get to that 
> later). 
> It's pretty much what Pablo has already done. Only minor improvement is 
> that 
>
> 1. Swap the two buffers directly instead of doing a extra copy (the 
> copy is merged into the first scaling) 
> 2. Do not use maxabs2 since that function is allocating (which is 
> probably a bug that should be fixed, or wait for jb/function branch 
> ;-p) 
> 3. I didn't change Ks to Ks - 1 in my version since I'm not sure which 
> one are you using for the matlab version. 
>
> Depending on the machine I benchmark it on, I see a 10-20% performance 
> improvement. The CPU utilization is ~75-80% on all cores so that's 
> about the maximum performance improvement you could hope for without 
> further optimize fftw =) 
>
> If you are benchmarking the matlab version with the same Ks, here's a 
> few guess why the julia version could still be ~20% slower and also a 
> few other improvements that I've tried 
>
> 1. Matlab might be using threads for these operations Julia currently 
> can't. We are working hard to make that possible (look for the 
> `multi-threading` issue/pr tag on github) 
> 2. Matlab might be storing the array in a different format (or 
> otherwise makes it easier to use the CPU SIMD instructions) 
>
> This could be fixed with compiler improvements[4]. 
> Another way to solve this is to change the format of the Array. 
> This is the way I was using (I'm using a much smaller array so the 
> time I spend outside fftw is much more important for me.) The code I 
> has can be find here[5]. In short, it uses the StructsOfArrays package 
> to store the real and imaginary part of the array separately. For 
> reasons that can be explained later, this helps the compiler to make 
> use of the SIMD instructions and I could get a 10x speed up in the 
> part outside fftw. One problem with this is that FFTW doesn't have too 
> good support for this representation[6] so you would need to either 
> copy it between the two representations when doing the FFT or change 
> the storage a little bit (see @simonster's comment in this issue) to 
> make it possible to run FFTW on it directly (which is what the first 
> 10lines of the code in [1] is) 
>
>
> To answer a few of your other questions that haven't be address already: 
>
> > When you said "temporary arrays", did you mean that I should use less 
> global variables? Yes, I can write arrays such as x, px only in the 
> functions Potential and Ko, but that will make the code less 
> understandable. 
>
> No. I'm talking about things like `a .* = b` is currently `a = a .* b` 
> and so it has to compute `a .* b` which needs to allocate an array. 
>
> > When starting the code, I use Ks=2^11, but when I finally plot the \phi 
> function, I get an error, and it seems that I misunderstood the linspace 
> function (I wrote linspace(-pm,pm,Ks) at first), so I changed it to 2^11+1. 
> It's a bad choice, and now I realize it. When use Ks=2^11-1, it only takes 
> 81 seconds for the code to run, which is slightly faster than the Matlab 
> version. 
>
> I think this is mentioned in the doc somewhere but I think it's not 
> really emphasized enough for people who needs it and I can't find it 
> now (I also learnt this the hard way) 
>
> > I have already stated the variables \phi and Pl outside the loop, and in 
> my opinion, they are pre-allocated and "should" be manipulated in place. To 
> be honest, maybe I don't really understand the words "in place" and 
> "pre-allocated". 
>
> See `a .*= b` above, it's not inplace, unfortunately. We have an issue 
> for this[2]. I don't find it to be too much an issue since writing the 
> loop isn't too much harder and it is limitted to a single operation 
> anyway. 
>
> > But I don't understand why the assignment \phi .*=ko2 is unnecessary. 
>  Are you saying that I should rewrite them as scale(\phi, ko2)? I'll have a 
> try. 
>
> This assignment is

Re: [julia-users] Re: A Julia code slower than Matlab, can it be faster?

2015-12-15 Thread
Following your advice, I tried @code_warntype ground(), but I didn't see 
any concrete types, I don't know whether there is any other package you use 
as default.

在 2015年12月15日星期二 UTC+8上午1:45:41,Stefan Karpinski写道:
>
> The code is well-typed – try @code_warntype ground() and you can see that 
> all the local variables have concrete types. The inner loop allocates a lot 
> of new arrays where it could update the same array in place, so you could 
> try avoiding allocation in the inner loop.
>
> On Mon, Dec 14, 2015 at 8:23 AM, Patrick Kofod Mogensen <
> patrick@gmail.com > wrote:
>
>> I get it, thanks for pointing that out.
>>
>>
>> On Monday, December 14, 2015 at 11:38:14 AM UTC+1, FQ wrote:
>>>
>>> The code does exit at some point, returning a 2049x2049 complex matrix. 
>>> note that delta phi is reassigned inside the loop at some point. 
>>>
>>> Am 14.12.2015 um 10:31 schrieb Patrick Kofod Mogensen: 
>>> > Are you sure this version works? I get Δϕ == 1.0 in all iterations, so 
>>> I 
>>> > can't see how it would ever exit the while loop... 
>>> > 
>>> > On Monday, December 14, 2015 at 4:35:43 AM UTC+1, 博陈 wrote: 
>>> > 
>>> > This is my first julia code, I am happy it did the right thing, 
>>> but 
>>> > compared with the Matlab code that did the same thing, it runs so 
>>> > slowly. The Matlab code takes about 90s, but the julia below takes 
>>> 130s. 
>>> > 
>>> > Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) 
>>> > -2./sqrt(y^2+1.)  + 1./sqrt((x-y)^2+1.) 
>>> > Ko(x::Float64,y::Float64) = x^2+y^2 
>>> > 
>>> > function ground() 
>>> > R = 320. 
>>> > Ks = 2^11 
>>> > x = linspace(-R, R, Ks+1) 
>>> > dx = 2R/Ks 
>>> > dt = 0.1 
>>> > pm = 2π/2dx 
>>> > px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) ) 
>>> > FFTW.set_num_threads(8) 
>>> > 
>>> > V = Float64[Potential(ix, iy)  for ix in x, iy in x] 
>>> > ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in 
>>> px] 
>>> > ko2 = ko.^2 
>>> > vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x] 
>>> > ϕ = Array(Complex128,(Ks+1,Ks+1)) 
>>> > Pl = plan_fft!(ϕ; flags=FFTW.MEASURE) 
>>> > ϕ = V.*complex(1.) 
>>> > 
>>> > normphi = sqrt(sumabs2(ϕ))*dx 
>>> > ϕ /= normphi 
>>> > 
>>> > ϕ = Pl*ϕ 
>>> > ϕ .*= ko 
>>> > ϕ = Pl\ϕ 
>>> > 
>>> > Δϕ = 1. 
>>> > nstep = 0 
>>> > print("start loop") 
>>> >  while Δϕ > 1.e-15 
>>> > ϕ′ = ϕ 
>>> > ϕ .*= vo 
>>> > 
>>> > normphi = sqrt(sumabs2(ϕ))*dx 
>>> > ϕ /= normphi 
>>> > 
>>> > ϕ = Pl*ϕ 
>>> > ϕ .*= ko2 
>>> > ϕ = Pl\ϕ 
>>> > # if check  Δϕ for every step, 35s is needed. 
>>> > if nstep>500 
>>> > Δϕ = maxabs(ϕ-ϕ′) 
>>> > end 
>>> > nstep += 1 
>>> > if mod(nstep,200)==0 
>>> > print(nstep,"  ") 
>>> > print("Δϕ=",Δϕ,"   ") 
>>> > end 
>>> > 
>>> > end 
>>> > ϕ .*= vo 
>>> > 
>>> > ϕ = Pl*ϕ 
>>> > ϕ .*= ko 
>>> > ϕ = Pl\ϕ 
>>> > 
>>> > normphi = sqrt(sumabs2(ϕ))*dx 
>>> > ϕ /= normphi 
>>> > end 
>>> > 
>>>
>>>
>

Re: [julia-users] A Julia code slower than Matlab, can it be faster?

2015-12-15 Thread
When you said "temporary arrays", did you mean that I should use less 
global variables? Yes, I can write arrays such as x, px only in the 
functions Potential and Ko, but that will make the code less 
understandable. 

When starting the code, I use Ks=2^11, but when I finally plot the \phi 
function, I get an error, and it seems that I misunderstood the linspace 
function (I wrote linspace(-pm,pm,Ks) at first), so I changed it to 2^11+1. 
It's a bad choice, and now I realize it. When use Ks=2^11-1, it only takes 
81 seconds for the code to run, which is slightly faster than the Matlab 
version.

I have already stated the variables \phi and Pl outside the loop, and in my 
opinion, they are pre-allocated and "should" be manipulated in place. To be 
honest, maybe I don't really understand the words "in place" and 
"pre-allocated".

I changed the sentences normalizing the array and use the function 
scale!(), and get a performance improvement of about 5s, that's also very 
good.

But I don't understand why the assignment \phi .*=ko2 is unnecessary.  Are 
you saying that I should rewrite them as scale(\phi, ko2)? I'll have a try.


Finally, I do really appreciated all your help. Mainly following your 
advice, my code got great performance improvement. 





在 2015年12月15日星期二 UTC+8上午2:13:12,Yichao Yu写道:
>
> On Sun, Dec 13, 2015 at 10:35 PM, 博陈 <chenph...@gmail.com > 
> wrote: 
> > This is my first julia code, I am happy it did the right thing, but 
> compared 
> > with the Matlab code that did the same thing, it runs so slowly. The 
> Matlab 
> > code takes about 90s, but the julia below takes 130s. 
>
> There's a few problems with the code. See below. You are still 
> creating many temporary arrays which is in general a bad idea. 
>
> > 
> > Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.)  + 
> > 1./sqrt((x-y)^2+1.) 
> > Ko(x::Float64,y::Float64) = x^2+y^2 
> > 
> > function ground() 
> > R = 320. 
> > Ks = 2^11 
> > x = linspace(-R, R, Ks+1) 
>
> As a general advice, don't use this dimension for fft. If you replace 
> you Ks with 2^11 - 1, you will see at least 2x speed up. 
>
> > dx = 2R/Ks 
> > dt = 0.1 
> > pm = 2π/2dx 
> > px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) ) 
> > FFTW.set_num_threads(8) 
> > 
> > V = Float64[Potential(ix, iy)  for ix in x, iy in x] 
> > ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px] 
> > ko2 = ko.^2 
> > vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x] 
> > ϕ = Array(Complex128,(Ks+1,Ks+1)) 
> > Pl = plan_fft!(ϕ; flags=FFTW.MEASURE) 
> > ϕ = V.*complex(1.) 
> > 
> > normphi = sqrt(sumabs2(ϕ))*dx 
> > ϕ /= normphi 
> > 
> > ϕ = Pl*ϕ 
> > ϕ .*= ko 
> > ϕ = Pl\ϕ 
> > 
> > Δϕ = 1. 
> > nstep = 0 
> > print("start loop") 
> >  while Δϕ > 1.e-15 
> > ϕ′ = ϕ 
>
> You are better off pre-allocate the two buffers and swap them 
>
> > ϕ .*= vo 
>
> Use `scale!` 
>
> > 
> > normphi = sqrt(sumabs2(ϕ))*dx 
> > ϕ /= normphi 
>
> Also use `scale!` or write the loop out 
>
> > 
> > ϕ = Pl*ϕ 
>
> The assignment is not necessary. 
>
> > ϕ .*= ko2 
>
> scale! 
>
> > ϕ = Pl\ϕ 
>
> Unnecessary 
>
> > # if check  Δϕ for every step, 35s is needed. 
> > if nstep>500 
> > Δϕ = maxabs(ϕ-ϕ′) 
> > end 
> > nstep += 1 
> > if mod(nstep,200)==0 
> > print(nstep,"  ") 
> > print("Δϕ=",Δϕ,"   ") 
> > end 
> > 
> > end 
> > ϕ .*= vo 
> > 
> > ϕ = Pl*ϕ 
> > ϕ .*= ko 
> > ϕ = Pl\ϕ 
> > 
> > normphi = sqrt(sumabs2(ϕ))*dx 
> > ϕ /= normphi 
> > end 
> > 
>


Re: [julia-users] Re: A Julia code slower than Matlab, can it be faster?

2015-12-15 Thread
The code get great performance improvement. My original version cost 133 s 
and about 166 G memories, while your version cost 61 s and about 423M 
memories!



在 2015年12月15日星期二 UTC+8下午7:34:55,Pablo Zubieta写道:
>
> Stefan said that your code is well type (type stable). Have you tried 
> Yichao's suggestions?
>
> Here is the function ground() taking them into account:
>
> function ground()
> R = 320.
> Ks = 2^11 - 1
> x = linspace(-R, R, Ks+1)
> dx = 2R/Ks
> dt = 0.1
> pm = 2π/2dx
> px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) )
> FFTW.set_num_threads(8)
>
>
> V = Float64[Potential(ix, iy)  for ix in x, iy in x]
> ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px]
> ko2 = ko.^2
> vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x]
> ϕ = Array(Complex128,(Ks+1,Ks+1))
> Pl = plan_fft!(ϕ; flags=FFTW.MEASURE)
> for i in eachindex(ϕ); ϕ[i] = V[i]; end # Conversion is automatic here
>
>
> invnormphi = 1. / (sqrt(sumabs2(ϕ))*dx)
> scale!(ϕ, invnormphi)
>
>
> Pl * ϕ # No need to assign to ϕ
> for i in eachindex(ϕ); ϕ[i] *= ko[i]; end
> Pl \ ϕ
>
> ϕ′ = similar(ϕ)
>
> Δϕ = 1.
> nstep = 0
> println("start loop")
> while Δϕ > 1.e-15
> copy!(ϕ′, ϕ)
> for i in eachindex(ϕ), ϕ[i] *= vo[i]; end
>
> invnormphi = 1. / (sqrt(sumabs2(ϕ))*dx)
> scale!(ϕ, invnormphi)
>
>
> Pl * ϕ
> for i in eachindex(ϕ); ϕ[i] *= ko2[i]; end
> Pl \ ϕ
> # if check  Δϕ for every step, 35s is needed.
> if nstep>500
> Δϕ = maxabs(ϕ-ϕ′)
> end
> nstep += 1
> if mod(nstep,200)==0
> print(nstep,"  ")
> println("Δϕ=",Δϕ)
> end
>
> end
> for i in eachindex(ϕ); ϕ[i] *= vo[i]; end
>
> Pl * ϕ
> for i in eachindex(ϕ); ϕ[i] *= ko[i]; end
> Pl \ ϕ
>
>
> invnormphi = 1. / (sqrt(sumabs2(ϕ))*dx)
> scale!(ϕ, invnormphi)
> end
>
>
>
>

Re: [julia-users] Re: A Julia code slower than Matlab, can it be faster?

2015-12-15 Thread
Oh, thank you very very much. I am quite a green hand.  Before following 
advises from you and Yichao, my code cost 133 s, but now, your version only 
cost 61 s!

Although Yichao has pointed out almost all the optimization strategies and 
make things clear enough, I don't really know how to follow his advice. 
Your answer has helped me a lot!  Before asking for help in this 
julia-users' group, I felt depressed for the inefficiency of my julia code, 
but now I feel great passion again. 



在 2015年12月15日星期二 UTC+8下午7:34:55,Pablo Zubieta写道:
>
> Stefan said that your code is well type (type stable). Have you tried 
> Yichao's suggestions?
>
> Here is the function ground() taking them into account:
>
> function ground()
> R = 320.
> Ks = 2^11 - 1
> x = linspace(-R, R, Ks+1)
> dx = 2R/Ks
> dt = 0.1
> pm = 2π/2dx
> px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) )
> FFTW.set_num_threads(8)
>
>
> V = Float64[Potential(ix, iy)  for ix in x, iy in x]
> ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px]
> ko2 = ko.^2
> vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x]
> ϕ = Array(Complex128,(Ks+1,Ks+1))
> Pl = plan_fft!(ϕ; flags=FFTW.MEASURE)
> for i in eachindex(ϕ); ϕ[i] = V[i]; end # Conversion is automatic here
>
>
> invnormphi = 1. / (sqrt(sumabs2(ϕ))*dx)
> scale!(ϕ, invnormphi)
>
>
> Pl * ϕ # No need to assign to ϕ
> for i in eachindex(ϕ); ϕ[i] *= ko[i]; end
> Pl \ ϕ
>
> ϕ′ = similar(ϕ)
>
> Δϕ = 1.
> nstep = 0
> println("start loop")
> while Δϕ > 1.e-15
> copy!(ϕ′, ϕ)
> for i in eachindex(ϕ), ϕ[i] *= vo[i]; end
>
> invnormphi = 1. / (sqrt(sumabs2(ϕ))*dx)
> scale!(ϕ, invnormphi)
>
>
> Pl * ϕ
> for i in eachindex(ϕ); ϕ[i] *= ko2[i]; end
> Pl \ ϕ
> # if check  Δϕ for every step, 35s is needed.
> if nstep>500
> Δϕ = maxabs(ϕ-ϕ′)
> end
> nstep += 1
> if mod(nstep,200)==0
> print(nstep,"  ")
> println("Δϕ=",Δϕ)
> end
>
> end
> for i in eachindex(ϕ); ϕ[i] *= vo[i]; end
>
> Pl * ϕ
> for i in eachindex(ϕ); ϕ[i] *= ko[i]; end
> Pl \ ϕ
>
>
> invnormphi = 1. / (sqrt(sumabs2(ϕ))*dx)
> scale!(ϕ, invnormphi)
> end
>
>
>
>

Re: [julia-users] A Julia code slower than Matlab, can it be faster?

2015-12-14 Thread
Did you mean that I should change all my .* , Pl* and ./ operations into 
'for' loops? I have had a try, but it seems that this will take more 
running time. I also wrote some small scripts to test the use of 
devectorization, and could not observe any performance improvement.



在 2015年12月15日星期二 UTC+8上午2:15:11,Yichao Yu写道:
>
> On Mon, Dec 14, 2015 at 1:12 PM, Yichao Yu <yyc...@gmail.com > 
> wrote: 
> > On Sun, Dec 13, 2015 at 10:35 PM, 博陈 <chenph...@gmail.com > 
> wrote: 
> >> This is my first julia code, I am happy it did the right thing, but 
> compared 
> >> with the Matlab code that did the same thing, it runs so slowly. The 
> Matlab 
> >> code takes about 90s, but the julia below takes 130s. 
>
> Another issue is that the code is not vectorizable (SIMD sense) due to 
> the way we store the complex array. (This is almost identical to the 
> simulation I am doing so I've run into this problem.) I might do some 
> more benchmarks on alternatives and report back later. 
>
> > 
> > There's a few problems with the code. See below. You are still 
> > creating many temporary arrays which is in general a bad idea. 
> > 
> >> 
> >> Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.) 
>  + 
> >> 1./sqrt((x-y)^2+1.) 
> >> Ko(x::Float64,y::Float64) = x^2+y^2 
> >> 
> >> function ground() 
> >> R = 320. 
> >> Ks = 2^11 
> >> x = linspace(-R, R, Ks+1) 
> > 
> > As a general advice, don't use this dimension for fft. If you replace 
> > you Ks with 2^11 - 1, you will see at least 2x speed up. 
> > 
> >> dx = 2R/Ks 
> >> dt = 0.1 
> >> pm = 2π/2dx 
> >> px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) ) 
> >> FFTW.set_num_threads(8) 
> >> 
> >> V = Float64[Potential(ix, iy)  for ix in x, iy in x] 
> >> ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px] 
> >> ko2 = ko.^2 
> >> vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x] 
> >> ϕ = Array(Complex128,(Ks+1,Ks+1)) 
> >> Pl = plan_fft!(ϕ; flags=FFTW.MEASURE) 
> >> ϕ = V.*complex(1.) 
> >> 
> >> normphi = sqrt(sumabs2(ϕ))*dx 
> >> ϕ /= normphi 
> >> 
> >> ϕ = Pl*ϕ 
> >> ϕ .*= ko 
> >> ϕ = Pl\ϕ 
> >> 
> >> Δϕ = 1. 
> >> nstep = 0 
> >> print("start loop") 
> >>  while Δϕ > 1.e-15 
> >> ϕ′ = ϕ 
> > 
> > You are better off pre-allocate the two buffers and swap them 
> > 
> >> ϕ .*= vo 
> > 
> > Use `scale!` 
> > 
> >> 
> >> normphi = sqrt(sumabs2(ϕ))*dx 
> >> ϕ /= normphi 
> > 
> > Also use `scale!` or write the loop out 
> > 
> >> 
> >> ϕ = Pl*ϕ 
> > 
> > The assignment is not necessary. 
> > 
> >> ϕ .*= ko2 
> > 
> > scale! 
> > 
> >> ϕ = Pl\ϕ 
> > 
> > Unnecessary 
> > 
> >> # if check  Δϕ for every step, 35s is needed. 
> >> if nstep>500 
> >> Δϕ = maxabs(ϕ-ϕ′) 
> >> end 
> >> nstep += 1 
> >> if mod(nstep,200)==0 
> >> print(nstep,"  ") 
> >> print("Δϕ=",Δϕ,"   ") 
> >> end 
> >> 
> >> end 
> >> ϕ .*= vo 
> >> 
> >> ϕ = Pl*ϕ 
> >> ϕ .*= ko 
> >> ϕ = Pl\ϕ 
> >> 
> >> normphi = sqrt(sumabs2(ϕ))*dx 
> >> ϕ /= normphi 
> >> end 
> >> 
>


[julia-users] Re: A Julia code slower than Matlab, can it be faster?

2015-12-14 Thread
Yes, I have compared the first and second call of ground(), the second call 
is obviously faster. But the second one is still too slow, compared to the 
Matlab version.

在 2015年12月15日星期二 UTC+8上午1:49:16,Jean-François Baffier写道:
>
> Before checking the code, I have a short question. As Julia is interpreted 
> language, the first execution of a function will include compilation of 
> your code. Do you compare the time for Julia on a second call of your 
> function or on the first call?
>
> You can read the related performance tips here 
> <http://docs.julialang.org/en/stable/manual/performance-tips/>.
>
> Le lundi 14 décembre 2015 12:35:43 UTC+9, 博陈 a écrit :
>>
>> This is my first julia code, I am happy it did the right thing, but 
>> compared with the Matlab code that did the same thing, it runs so slowly. 
>> The Matlab code takes about 90s, but the julia below takes 130s.
>>
>> Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.)  + 
>> 1./sqrt((x-y)^2+1.)
>> Ko(x::Float64,y::Float64) = x^2+y^2
>>
>> function ground()
>> R = 320.
>> Ks = 2^11
>> x = linspace(-R, R, Ks+1)
>> dx = 2R/Ks
>> dt = 0.1
>> pm = 2π/2dx
>> px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) )
>> FFTW.set_num_threads(8)
>>
>> V = Float64[Potential(ix, iy)  for ix in x, iy in x]
>> ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px]
>> ko2 = ko.^2
>> vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x]
>> ϕ = Array(Complex128,(Ks+1,Ks+1))
>> Pl = plan_fft!(ϕ; flags=FFTW.MEASURE)
>> ϕ = V.*complex(1.)
>>
>> normphi = sqrt(sumabs2(ϕ))*dx
>> ϕ /= normphi
>>
>> ϕ = Pl*ϕ
>> ϕ .*= ko
>> ϕ = Pl\ϕ
>>
>> Δϕ = 1.
>> nstep = 0
>> print("start loop")
>>  while Δϕ > 1.e-15
>> ϕ′ = ϕ
>> ϕ .*= vo
>>
>> normphi = sqrt(sumabs2(ϕ))*dx
>> ϕ /= normphi
>>
>> ϕ = Pl*ϕ
>> ϕ .*= ko2
>> ϕ = Pl\ϕ
>> # if check  Δϕ for every step, 35s is needed.
>> if nstep>500
>> Δϕ = maxabs(ϕ-ϕ′)
>> end
>> nstep += 1
>> if mod(nstep,200)==0
>> print(nstep,"  ")
>> print("Δϕ=",Δϕ,"   ")
>> end
>>
>> end
>> ϕ .*= vo
>>
>> ϕ = Pl*ϕ
>> ϕ .*= ko
>> ϕ = Pl\ϕ
>>
>> normphi = sqrt(sumabs2(ϕ))*dx
>> ϕ /= normphi
>> end
>>
>>

[julia-users] A Julia code slower than Matlab, can it be faster?

2015-12-13 Thread
This is my first julia code, I am happy it did the right thing, but 
compared with the Matlab code that did the same thing, it runs so slowly. 
The Matlab code takes about 90s, but the julia below takes 130s.

Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.)  + 
1./sqrt((x-y)^2+1.)
Ko(x::Float64,y::Float64) = x^2+y^2

function ground()
R = 320.
Ks = 2^11
x = linspace(-R, R, Ks+1)
dx = 2R/Ks
dt = 0.1
pm = 2π/2dx
px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) )
FFTW.set_num_threads(8)

V = Float64[Potential(ix, iy)  for ix in x, iy in x]
ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px]
ko2 = ko.^2
vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x]
ϕ = Array(Complex128,(Ks+1,Ks+1))
Pl = plan_fft!(ϕ; flags=FFTW.MEASURE)
ϕ = V.*complex(1.)

normphi = sqrt(sumabs2(ϕ))*dx
ϕ /= normphi

ϕ = Pl*ϕ
ϕ .*= ko
ϕ = Pl\ϕ

Δϕ = 1.
nstep = 0
print("start loop")
 while Δϕ > 1.e-15
ϕ′ = ϕ
ϕ .*= vo

normphi = sqrt(sumabs2(ϕ))*dx
ϕ /= normphi

ϕ = Pl*ϕ
ϕ .*= ko2
ϕ = Pl\ϕ
# if check  Δϕ for every step, 35s is needed.
if nstep>500
Δϕ = maxabs(ϕ-ϕ′)
end
nstep += 1
if mod(nstep,200)==0
print(nstep,"  ")
print("Δϕ=",Δϕ,"   ")
end

end
ϕ .*= vo

ϕ = Pl*ϕ
ϕ .*= ko
ϕ = Pl\ϕ

normphi = sqrt(sumabs2(ϕ))*dx
ϕ /= normphi
end



Re: [julia-users] The optimization strategy for fft didn't work

2015-12-09 Thread
Thanks a lot for your help. I am just starting to code with julia, and I 
surprisedly found that my julia code is about 2.5 times slower than the 
matlab one, they just do the same things. I learn from  
http://docs.julialang.org/en/release-0.4/stdlib/math/?highlight=fft#Base.fft  
about the fft optimization strategies. The optimization strategies are 
1. use plan_fft!() instead of plan_fft() or fft() to decrease the memory to 
be allocated and preallocate the fft operation.
2. use flags like FFTW.MEASURE or FFEW.EXHAUSTIVE. In my project, I have 
involved that flags, and it surely make a difference. But I'm just confused 
why the fft! and plan_fft strategy didn't work, which was clearly explained 
by you.





在 2015年12月9日星期三 UTC+8下午9:09:07,Yichao Yu写道:
>
>
>
> On Wed, Dec 9, 2015 at 5:57 AM, 博陈 <chenph...@gmail.com > 
> wrote:
>
>>
>> <https://lh3.googleusercontent.com/-lTsIsN0BaAY/VmgIypsEQ2I/AAk/n-j-ZalGl5I/s1600/QQ%25E6%2588%25AA%25E5%259B%25BE20151209185519.png>
>> the optimization strategy for fft given by the official documentation 
>> seems to fail. Why?
>>
>
> You didn't mention exactly what optimization strategy you are trying so I 
> would need to guess.
>
> 1. You should expect the first one to be no faster than the last one since 
> it's basically doing the same thing and the first one does it all in global 
> scope
> 2. In place op doesn't make too much a difference here since the operation 
> you are doing is already very expensive. (most of the time are spent in 
> FFTW)
> 3. It doesn't really matter for this one (since FFTW determines the 
> performance here) but you should benchmark the loop in a function and hoist 
> the creation of the plan out of the loop. For your actual code, you might 
> want to make the plan a global constant or a parametrized field of a type 
> since it has not been not particularly type stable.
> 4. You can use `plan_fft(, flags=FFTW.MEASURE)` to let FFTW select the 
> best algorithm by actually measuring the time instead of guessing. It gives 
> me 20% to 30% speed up for your example and IIRC more speed up for small 
> problems.
> 5. You can use `FFTW.flops(p)` to figure out how much floating point 
> operations are needed to perform your transformation. On my computer, a 
> MEASURE'd plan takes 4.3s (100 times) and the naive estimation from 
> assuming one operation per clock cycle is 2s (100 times) so it's the right 
> order of magnitude.
>
>

[julia-users] Why is my code so slow?

2015-12-08 Thread
function ground()
#using PyPlot

print("start")
const R = 320.
const Ks = 2^11
const x = linspace(-R, R, Ks+1)
const dx = 2R/Ks
const y = x
const dt = 0.1
const pm = 2π/2dx
px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) )
py = px
FFTW.set_num_threads(8)
#Potential(x::Float64, y::Float64) = -2/sqrt((x-2.)^2+1) 
-2/sqrt((y-2)^2+1) -1/sqrt((x+2.)^2+1) -1/sqrt((y+2)^2+1) + 
1/sqrt((x-y)^2+1)
Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.) 
 + 1./sqrt((x-y)^2+1.)
Ko(x::Float64,y::Float64) = x^2+y^2
V = Float64[Potential(ix, iy)  for ix in x, iy in y]
ϕ = V.*(1.+0im)
ϕ /= maximum(abs(ϕ))
ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in py]
ko2 = ko.^2
#ϕ = ground(x,V)

#function ground(x, V)

ϕ = fft(ϕ)
ϕ .*= ko
ϕ = ifft(ϕ)
Δϕ = 1.
nstep = 0
while Δϕ > 1.e-15
ϕ′ = ϕ
ϕ .*= e.^(-dt*V)
ϕ /= maximum(abs(ϕ))
ϕ = fft(ϕ)
ϕ .*= ko2
ϕ = ifft(ϕ)

Δϕ = maximum(abs(ϕ-ϕ′))
nstep += 1
if mod(nstep,200)==0
print(nstep)
print("Δϕ=",Δϕ)
end
end
ϕ .*= exp(-dt*V)
ϕ = fft(ϕ)
ϕ .*= ko
ϕ = ifft(ϕ)
ϕ /= maximum(abs(ϕ))

#imshow(abs(ϕ)^2, interpolation="none", extent=[-R, R ,-R ,R])
#show()
end

@time ground()