Re: [julia-users] Re: julia vs cython benchmark

2014-12-07 Thread Andre Bieler
Yes I am very pleased with the result too!
Really impressed with both the julia language and community.
Keep up the good work!

On Saturday, December 6, 2014 6:17:40 PM UTC-5, Stefan Karpinski wrote:

 Great – thanks for reporting back. It's nice that you could get that kind 
 of good performance here without too much shenanigans.

 On Sat, Dec 6, 2014 at 5:50 PM, Andre Bieler andre.b...@gmail.com 
 javascript: wrote:

 for completeness:

 with the inner loops now going through the first index as suggested by 
 Jeff,
 there was another increase in speed. So now I stand at *16.8 s* on 
 average
 with julia.

 The same thing in python/numpy takes roughly *6800 s* to run
 (however not vectorized in numpy, using for loops as in the examples
 above)




[julia-users] Re: julia vs cython benchmark

2014-12-07 Thread Andre Bieler
vectorized numpy i dont know the result.. :)
but not sure if it would be a good approach anyway
as the outer loop can end after one iteration already.


[julia-users] Re: julia vs cython benchmark

2014-12-06 Thread Andre Bieler
for completeness:

with the inner loops now going through the first index as suggested by Jeff,
there was another increase in speed. So now I stand at *16.8 s* on average
with julia.

The same thing in python/numpy takes roughly *6800 s* to run
(however not vectorized in numpy, using for loops as in the examples
above)


Re: [julia-users] Re: julia vs cython benchmark

2014-12-06 Thread Stefan Karpinski
Great – thanks for reporting back. It's nice that you could get that kind
of good performance here without too much shenanigans.

On Sat, Dec 6, 2014 at 5:50 PM, Andre Bieler andre.biele...@gmail.com
wrote:

 for completeness:

 with the inner loops now going through the first index as suggested by
 Jeff,
 there was another increase in speed. So now I stand at *16.8 s* on average
 with julia.

 The same thing in python/numpy takes roughly *6800 s* to run
 (however not vectorized in numpy, using for loops as in the examples
 above)



[julia-users] Re: julia vs cython benchmark

2014-12-06 Thread Viral Shah
Just curious - how does this example compare to vectorized numpy?

-viral

On Sunday, December 7, 2014 4:20:33 AM UTC+5:30, Andre Bieler wrote:

 for completeness:

 with the inner loops now going through the first index as suggested by 
 Jeff,
 there was another increase in speed. So now I stand at *16.8 s* on average
 with julia.

 The same thing in python/numpy takes roughly *6800 s* to run
 (however not vectorized in numpy, using for loops as in the examples
 above)



[julia-users] Re: julia vs cython benchmark

2014-11-27 Thread Ariel Keselman
in the Cython code you turned off bounds checking. This can be done for 
Julia with the @inbounds macro. Just use it in your loops like this:

@inbounds for i in whatever
...
end

also @simd may help, sems you can use it in a couple of the innrmost loops. 
It sems also simple to parallelize with a shared array and a @parallel for


[julia-users] Re: julia vs cython benchmark

2014-11-27 Thread Andre Bieler
alright!
did the @inbounds and @simd and benchmarked again 
(for different light source position, thats why numbers dont match exactly 
with the ones above)

cython original code:   27.5 s
julia original code:28.3 s

julia with @inbounds:   21.3 s

julia 
with @inbounds  @simd: 19.0 s

Looks like a nice speed up to me! :)
I ll look into the column-major issue later.

Thanks for the input guys!





On Thursday, November 27, 2014 3:17:28 AM UTC-5, Ariel Keselman wrote:

 in the Cython code you turned off bounds checking. This can be done for 
 Julia with the @inbounds macro. Just use it in your loops like this:

 @inbounds for i in whatever
 ...
 end

 also @simd may help, sems you can use it in a couple of the innrmost 
 loops. It sems also simple to parallelize with a shared array and a 
 @parallel for



[julia-users] Re: julia vs cython benchmark

2014-11-26 Thread Jeff Waller
There is one thing is see as a potential.

The outer loop *i* is incrementing the first index, and Julia stores things 
in column-major order, so any speed gain from CPU cache is potentially lost 
since you using elements that are not contiguous in the inner loops.