Re: [julia-users] why sum(abs(A)) is very slow

2014-08-25 Thread Dahua Lin
If A is not a global variable (i.e within a function), @devec would be much 
faster (comparable to sumabs)

Dahua


On Monday, August 25, 2014 4:26:22 AM UTC+8, Adam Smith wrote:

 I've run into this a few times (and a few hundred times in python), so I 
 made an @iterize macro. Not sure how useful it is, but you can put it in 
 front of a bunch of chained function calls and it will make iterators 
 automatically to avoid creating any temp arrays:

 A = randn(126)
 @time sum(abs(A))
 @time @iterize sum(abs(A))
 @time sumabs(A)

 println(sum(abs(A)))
 println(@iterize sum(abs(A)))
 println(sumabs(A))

 println(sum(A))
 println(@iterize sum(A))

 println(sum(ceil(floor(abs(A)
 println(@iterize sum(ceil(floor(abs(A)

 Output:
 elapsed time: 0.367873796 seconds (537878296 bytes allocated, 2.48% gc 
 time)
 elapsed time: 0.107278414 seconds (577616 bytes allocated)
 elapsed time: 0.045590637 seconds (639580 bytes allocated)
 5.3551932868680775e7
 5.3551932868672036e7
 5.3551932868678436e7
 658.6904827808266
 658.6904827808266
 2.4537098e7
 2.4537098e7

 The macro is in a gist: Iterize.jl 
 https://gist.github.com/sunetos/f311d5408854e65d7ff9

 I had tried using @devec, but that actually made it about 100x slower.

 On Saturday, August 23, 2014 8:15:44 AM UTC-4, Stefan Karpinski wrote:

 On Sat, Aug 23, 2014 at 7:23 AM, gael@gmail.com wrote:

 To do any of that justice, you end up with a language that looks 
 basically like Haskell. So why not just use Haskell?

 Because I don't know anything about it (yet), except the name and the 
 fact that you often associated it with lazy evaluation.

 Because (#2), this could be a way to make sumabs and the likes obsolete 
 in *Julia*. :)


 We do really want to get rid of things like sumabs., so it's certainly 
 worth considering. I know I've thought about it many times, but I don't 
 think it's the right answer – you really want to preserve eager evaluation 
 semantics, even if you end up moving around the actual evaluation of things.
  


Re: [julia-users] why sum(abs(A)) is very slow

2014-08-23 Thread Rafael Fourquet

 There's a complicated limit to when you want to fuse loops – at some point
 multiple iterations becomes better than fused loops and it all depends on
 how much and what kind of work you're doing. In general doing things lazily
 does not cut down on allocation since you have to allocate the
 representation of the operations that you're deferring and close over any
 values that they depend on.

This particular example only works out so well because the iterable is so
 simple that the compiler can eliminate the laziness and do the eager loop
 fused version for you. This will not generally be the case.


Thank you for taking so much time to explain and for your patience!


 You're welcome to experiment (and Julia's type system makes it pretty easy
 to do so), but I think that you'll quickly find that more laziness is not a
 panacea for performance problems.


My question came partly from Python3 having lazy map and reduce. But having
the choice is good and in Julia all laziness can be provided now by imap
etc. If someone has an (self-contained) example where lazy element-wise
computations is worse than eager, please post! (I'm interested in
understanding better the above mentioned limit)


Re: [julia-users] why sum(abs(A)) is very slow

2014-08-23 Thread gael . mcdon
(I was also thinking about element-wise operations)


Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread Stefan Karpinski
There is a sumabs function in Base for this reason. We'd like to eventually
be able to do stream fusion to make the vectorized version as efficient as
the manually fused version, but for now there's a performance gap. Note
that the vectorized version is the same speed you would get in other
languages where you express this in vectorized form – it's just that you
can get much faster with manual loop fusion.


On Thu, Aug 21, 2014 at 11:03 PM, John Myles White johnmyleswh...@gmail.com
 wrote:

 Please read http://julialang.org/blog/2013/09/fast-numeric/

  — John

 On Aug 21, 2014, at 8:02 PM, K Leo cnbiz...@gmail.com wrote:

  A is a 1-dimensional array.  I used to compute sum(abs(A)).  But when I
 changed to the following, the speed increased nearly 10 fold.  Why is that?
 
 sumA=0
 for i=1:length(A)
 sumA = sumA + abs(A[i])
 end




Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread Stefan Karpinski
Yes, that works nicely. Obviously it would be even nicer not to have to do
that :-)


On Fri, Aug 22, 2014 at 10:53 AM, Rafael Fourquet fourquet.raf...@gmail.com
 wrote:

 We'd like to eventually be able to do stream fusion to make the vectorized
 version as efficient as the manually fused version, but for now there's a
 performance gap.


 It is also not too difficult to implement a fused version via iterators,
 eg:

 immutable iabs{X}
 x::X
 end

 Base.start(i::iabs) = start(i.x)
 Base.next(i::iabs, s) = ((v, s) = next(i.x, s); (abs(v), s))
 Base.done(i::iabs, s) = done(i.x, s)

 Then sum(iabs(A)) is ways faster than sum(abs(A)) (but still slightly
 slower than sumabs(A)).




Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread Peter Simon
Could you please explain why the iterator version is so much faster?  Is it 
simply from avoiding temporary array allocation?

Thanks,
--Peter

On Friday, August 22, 2014 7:53:59 AM UTC-7, Rafael Fourquet wrote:

 We'd like to eventually be able to do stream fusion to make the vectorized 
 version as efficient as the manually fused version, but for now there's a 
 performance gap. 


 It is also not too difficult to implement a fused version via iterators, 
 eg: 

 immutable iabs{X}
 x::X
 end

 Base.start(i::iabs) = start(i.x)
 Base.next(i::iabs, s) = ((v, s) = next(i.x, s); (abs(v), s))
 Base.done(i::iabs, s) = done(i.x, s)

 Then sum(iabs(A)) is ways faster than sum(abs(A)) (but still slightly 
 slower than sumabs(A)).



Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread Rafael Fourquet

 Obviously it would be even nicer not to have to do that :-)


My naive answer is then why not make vectorized functions lazy (like iabs
above, plus dimensions information) by default? Do you have links to
relevant discussions?


Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread Stefan Karpinski
On Fri, Aug 22, 2014 at 11:32 AM, Rafael Fourquet fourquet.raf...@gmail.com
 wrote:

  My naive answer is then why not make vectorized functions lazy (like iabs
 above, plus dimensions information) by default? Do you have links to
 relevant discussions?


If that was the way things worked, would sum(abs(A)) do the computation
right away or just wait until you ask for the result? In other words,
should sum also be lazy if we're doing all vectorized computations that
way? What about sum(abs(A),1)? Lazy or eager? What about A*B when A and B
are matrices? Should that be an eager matrix product or just a lazy
representation that hangs onto A and B and answers queries about their
product on demand? If you're computing trace(A*B) then you can save a huge
amount of work that way. But if you need all or most of the values in A*B
then computing each one as a vector-vector product on demand is very
inefficient.


Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread gael . mcdon
I'm not familiar with lazy evaluation (I've not used any language implementing 
it). But I was wondering...

Why not have a 'calculate_now' function to let the programmer choose when a 
result is guaranteed to be calculated? Otherwise, resort to lazy 
representations.

There could be some heuristic also: if at least one of the original object is 
freed by the GC, perform all the calculations depending on it.

That could also be simpler: defer actual calculations until the end of the 
current block.


Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread Rafael Fourquet
  If that was the way things worked, would sum(abs(A)) do the computation
 right away or just wait until you ask for the result? In other words,
 should sum also be lazy if we're doing all vectorized computations that
 way?


sum(abs(A)) returns a scalar, so lazy would buy nothing here (in most cases
at least, let's not be haskell!)


  What about sum(abs(A),1)? Lazy or eager?


If dim A1, the result is an array so lazy.
In short, be lazy when it gives opportunity for loop fusion, and saves
allocations.


 What about A*B when A and B are matrices?


I was more thinking of operations done element-wise (of the form of map(f,
A1, ...),  like abs and +). Optimizing a product A*B is less trivial (C++
expressions templates...), si I prefer not answer!


Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread Rafael Fourquet
 Could you please explain why the iterator version is so much faster?  Is
 it simply from avoiding temporary array allocation?


That's what I understand, and maybe marginally because there is only one
pass over the data.


Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread Stefan Karpinski
On Aug 22, 2014, at 1:45 PM, Rafael Fourquet fourquet.raf...@gmail.com wrote:
 
 In short, be lazy when it gives opportunity for loop fusion, and saves 
 allocations.

There's a complicated limit to when you want to fuse loops – at some point 
multiple iterations becomes better than fused loops and it all depends on how 
much and what kind of work you're doing. In general doing things lazily does 
not cut down on allocation since you have to allocate the representation of the 
operations that you're deferring and close over any values that they depend on.

This particular example only works out so well because the iterable is so 
simple that the compiler can eliminate the laziness and do the eager loop fused 
version for you. This will not generally be the case. You're welcome to 
experiment (and Julia's type system makes it pretty easy to do so), but I think 
that you'll quickly find that more laziness is not a panacea for performance 
problems.

[julia-users] why sum(abs(A)) is very slow

2014-08-21 Thread K Leo
A is a 1-dimensional array.  I used to compute sum(abs(A)).  But when I 
changed to the following, the speed increased nearly 10 fold.  Why is that?


sumA=0
for i=1:length(A)
sumA = sumA + abs(A[i])
end