Re: [Caml-list] Optimizing Float Ref's
Dmitry Bely dmitry.b...@gmail.com writes: rewrite the function to get rid of it? My best achievement so far is let max_val (a:float array) = let m = [|-.min_float|] in for i = 0 to (Array.length a) - 1 do if a.(i) m.(0) then m.(0) - a.(i) done; m.(0) but it looks ugly... - Dmitry Bely Why not store the index of the max element instead of the float itself? Recursively, to avoid the ref as well, this could look like this: let max_val (a:float array) = let len = Array.length a in let rec loop max i = if i = len then a.(max) else if a.(max) a.(i) then loop max (i + 1) else loop i (i + 1) in if len = 0 then -.min_float else loop 0 1 MfG Goswin ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
On Thu, Sep 3, 2009 at 1:44 PM, Xavier Leroy xavier.le...@inria.fr wrote: (...) But, I thought that float ref's were automatically unboxed by the compiler when they didn't escape the local context. Yes, if all uses of the float ref are unboxed, which is the case in your code. let max_val (a:float array) = let m = ref min_float in for i = 0 to (Array.length a) - 1 do if a.(i) !m then m := a.(i) done; !m !m is not unboxed (Ocaml 3.11). Should it? - Dmitry Bely ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
On Wed, Mar 31, 2010 at 9:55 PM, Jake Donham j...@donham.org wrote: On Wed, Mar 31, 2010 at 10:21 AM, Dmitry Bely dmitry.b...@gmail.com wrote: let max_val (a:float array) = let m = ref min_float in for i = 0 to (Array.length a) - 1 do if a.(i) !m then m := a.(i) done; !m !m is not unboxed (Ocaml 3.11). Should it? Why do you say that? It looks unboxed in the lambda code: (let (max_val/58 (function a/59 (let (m/60 (field 13 (global Pervasives!))) (seq (for i/61 0 to (- (array.length a/59) 1) (if (. (array.get a/59 i/61) m/60) (assign m/60 (array.get a/59 i/61)) 0a)) m/60 No. assign is translated into memory allocation inside the loop. My x86 assembler is not very good but it looks to me like m is kept in %esi. Ok, probably returning float makes allocation place not obvious. Try this variant: let max_val (a:float array) = let m = ref min_float in for i = 0 to (Array.length a) - 1 do if a.(i) !m then m := a.(i) done; truncate !m It's translated into the following C-- code (-unsafe flag): (function camlT__max_val_58 (a/59: addr) (let m/60 (load (+a camlPervasives 52)) (let (i/61 1 bound/65 (+ (or (u (load (+a a/59 -4)) 10) 1) -2)) (catch (if ( i/61 bound/65) (exit 3) (loop (if (f (load float64u (+a (+a a/59 ( i/61 2)) -4)) (load float64u m/60)) (assign m/60 (alloc 2301 (load float64u (+a (+a a/59 ( i/61 2)) -4 []) (let i/64 i/61 (assign i/61 (+ i/61 2)) (if (== i/64 bound/65) (exit 3) [] with(3) [])) (+ ( (intoffloat (load float64u m/60)) 1) 1))) See alloc 2301 inside the loop? That's boxed float. - Dmitry Bely ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
On 31/03/2010 19:21, Dmitry Bely wrote: On Thu, Sep 3, 2009 at 1:44 PM, Xavier Leroyxavier.le...@inria.fr wrote: (...) But, I thought that float ref's were automatically unboxed by the compiler when they didn't escape the local context. Yes, if all uses of the float ref are unboxed, which is the case in your code. let max_val (a:float array) = let m = ref min_float in for i = 0 to (Array.length a) - 1 do if a.(i) !m then m := a.(i) done; !m !m is not unboxed (Ocaml 3.11). Should it? As far as I can tell, the boxing for the reference cell is removed by the compiler (that is, the reference is implemented as a mutable local variable), but not the boxing for the float contained in the reference. Since a is a float array, it is needed to box the float before putting it in the reference, hence the allocation. -- Alain ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
On Wed, Mar 31, 2010 at 11:52 PM, Ted Sandler ted.sand...@gmail.com wrote: rpretation of Xavier's words was that the compiler would completely optimize out memory allocation and keep !m in register. I can live with the allocation but not at every iteration... How to rewrite the function to get rid of it? My best achievement so far is Try tail-recursion. Won't work. Show the code. - Dmitry Bely ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
I'm running OCaml 3.11.1, and I noticed something strange in some native code for matrix multiply today. The code was [...] [Local float ref being unboxed or not? ] You omitted the definition of dims, but after adding the obvious definition, the float ref sum is indeed completely unboxed and is kept in a float register (on x86-64 bits) or stack location (on x86-32 bits). No modify takes place in the inner loop. So, I don't understand the problem you observed. Feel free to post a report on the BTS with a *complete* piece of code that reproduces the problem. But, I thought that float ref's were automatically unboxed by the compiler when they didn't escape the local context. Yes, if all uses of the float ref are unboxed, which is the case in your code. - Xavier Leroy ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
Xavier, Thanks for the comments. I thought that float ref's were unboxed by default! In fact, I find that breaking out the code into a stand- alone example which loops through matrix multiplies only indeed does not have any calls to caml_modify; everything is unboxed and stored on the stack (I'm on x86) as you say. It must be the remainder of my application that is producing the calls to caml_modify (the profile I was using was embedded in a larger application with lots of inlining going on, so maybe something that calls caml_modify is getting inlined around the matrix multiply and confusing the profiler). Thanks, Will On Sep 3, 2009, at 5:44 AM, Xavier Leroy wrote: I'm running OCaml 3.11.1, and I noticed something strange in some native code for matrix multiply today. The code was [...] [Local float ref being unboxed or not? ] You omitted the definition of dims, but after adding the obvious definition, the float ref sum is indeed completely unboxed and is kept in a float register (on x86-64 bits) or stack location (on x86-32 bits). No modify takes place in the inner loop. So, I don't understand the problem you observed. Feel free to post a report on the BTS with a *complete* piece of code that reproduces the problem. But, I thought that float ref's were automatically unboxed by the compiler when they didn't escape the local context. Yes, if all uses of the float ref are unboxed, which is the case in your code. - Xavier Leroy PGP.sig Description: This is a digitally signed message part ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
True. All float records and float arrays are unboxed so modifications don't have to pass through caml_modify. A single cell array will still have dynamic bound checking so I would recommend going for the record. Till On Sun, Aug 30, 2009 at 3:43 PM, Yaron Minskyymin...@gmail.com wrote: Float refs are not unboxed automatically, because refs Are polymorphic containers. If you create your own pseudo-ref, i.e., a record with a single mutable float field, then I believe you should get the behaviour you expect. Come to think of it, I wonder if it would be better to implement ref on top of a single-cell array, since then everyone would get the float unboxing whenever applicable. I imagine there is some runtime overhead to this, though. y On Aug 28, 2009, at 4:32 PM, Will M Farr f...@mit.edu wrote: Hello all, I'm running OCaml 3.11.1, and I noticed something strange in some native code for matrix multiply today. The code was let mmmul store m1 m2 = let (ni,nk) = dims m1 and (nk2,nj) = dims m2 and (sni,snj) = dims store in assert(nk=nk2); assert(ni=sni); assert(nj=snj); for i = 0 to ni - 1 do let row1 = m1.(i) and srow = store.(i) in for j = 0 to nj - 1 do let sum = ref 0.0 in (* Un-boxed float ref? *) for k = 0 to nk - 1 do let row2 = Array.unsafe_get m2 k in let x = Array.unsafe_get row1 k and y = Array.unsafe_get row2 j in sum := !sum +. x*.y done; Array.unsafe_set srow j !sum done done; store (I compiled with ocamlopt.) It multiplies the matrices (represented as arrays of arrays of floats) m1 and m2 together and puts the result into the matrix store. Profiling the code, I noticed a call to caml_modify during the execution of this function! Turns out that the culprit was the float ref sum. Changing to the following code (which eliminates the float ref, and uses the - and .( ) operators instead of unsafe_set and unsafe_get) eliminated that call, and sped things up tremendously: let mmmul store m1 m2 = let (ni,nk) = dims m1 and (nk2,nj) = dims m2 in for i = 0 to ni - 1 do let row1 = m1.(i) and srow = store.(i) in for j = 0 to nj - 1 do srow.(j) - 0.0; for k = 0 to nk - 1 do let row2 = Array.unsafe_get m2 k in let x = row1.(k) and y = row2.(j) in srow.(j) - srow.(j) +. x*.y done done done; store But, I thought that float ref's were automatically unboxed by the compiler when they didn't escape the local context. Is this a complier bug, is there a bad interaction with unsafe_get and unsafe_set, or is there something else going on that I don't understand? Any enlightenment would be appreciated. Thanks! Will ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
Thanks to all who have replied. I didn't realize that references were always boxed (I thought that, internally, a reference was implemented as a one-element array, and therefore float refs would benefit from the automatic unboxing of float arrays). That's good to know. Will On Aug 31, 2009, at 10:09 AM, Till Varoquaux wrote: True. All float records and float arrays are unboxed so modifications don't have to pass through caml_modify. A single cell array will still have dynamic bound checking so I would recommend going for the record. Till On Sun, Aug 30, 2009 at 3:43 PM, Yaron Minskyymin...@gmail.com wrote: Float refs are not unboxed automatically, because refs Are polymorphic containers. If you create your own pseudo-ref, i.e., a record with a single mutable float field, then I believe you should get the behaviour you expect. Come to think of it, I wonder if it would be better to implement ref on top of a single-cell array, since then everyone would get the float unboxing whenever applicable. I imagine there is some runtime overhead to this, though. y On Aug 28, 2009, at 4:32 PM, Will M Farr f...@mit.edu wrote: Hello all, I'm running OCaml 3.11.1, and I noticed something strange in some native code for matrix multiply today. The code was let mmmul store m1 m2 = let (ni,nk) = dims m1 and (nk2,nj) = dims m2 and (sni,snj) = dims store in assert(nk=nk2); assert(ni=sni); assert(nj=snj); for i = 0 to ni - 1 do let row1 = m1.(i) and srow = store.(i) in for j = 0 to nj - 1 do let sum = ref 0.0 in (* Un-boxed float ref? *) for k = 0 to nk - 1 do let row2 = Array.unsafe_get m2 k in let x = Array.unsafe_get row1 k and y = Array.unsafe_get row2 j in sum := !sum +. x*.y done; Array.unsafe_set srow j !sum done done; store (I compiled with ocamlopt.) It multiplies the matrices (represented as arrays of arrays of floats) m1 and m2 together and puts the result into the matrix store. Profiling the code, I noticed a call to caml_modify during the execution of this function! Turns out that the culprit was the float ref sum. Changing to the following code (which eliminates the float ref, and uses the - and .( ) operators instead of unsafe_set and unsafe_get) eliminated that call, and sped things up tremendously: let mmmul store m1 m2 = let (ni,nk) = dims m1 and (nk2,nj) = dims m2 in for i = 0 to ni - 1 do let row1 = m1.(i) and srow = store.(i) in for j = 0 to nj - 1 do srow.(j) - 0.0; for k = 0 to nk - 1 do let row2 = Array.unsafe_get m2 k in let x = row1.(k) and y = row2.(j) in srow.(j) - srow.(j) +. x*.y done done done; store But, I thought that float ref's were automatically unboxed by the compiler when they didn't escape the local context. Is this a complier bug, is there a bad interaction with unsafe_get and unsafe_set, or is there something else going on that I don't understand? Any enlightenment would be appreciated. Thanks! Will ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs PGP.sig Description: This is a digitally signed message part ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
On Friday 28 August 2009 21:32:55 Will M Farr wrote: Hello all, I'm running OCaml 3.11.1, and I noticed something strange in some native code for matrix multiply today. I'm running OCaml 3.11.0 and I get the opposite result on both x86 and x64: $ ./will Took 31.445965s Took 39.942496s $ ./will Took 25.457591s Took 40.414526s But, I thought that float ref's were automatically unboxed by the compiler when they didn't escape the local context. Is this a complier bug, is there a bad interaction with unsafe_get and unsafe_set, or is there something else going on that I don't understand? Any enlightenment would be appreciated. That appears to occur in this case on my machine but it can be hampered sometimes if the accumulator is returned immediately and the problem worked around by performing a redundant floating point operation (such as 1.0 *. x) on the result. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Optimizing Float Ref's
On Sunday 30 August 2009 20:43:17 Yaron Minsky wrote: Float refs are not unboxed automatically, because refs Are polymorphic containers. If you create your own pseudo-ref, i.e., a record with a single mutable float field, then I believe you should get the behaviour you expect. I believe you are talking at cross purposes. Will wants his accumulator in a register. You are referring to float references in data structures being boxed because records are boxed. Look at the compiled forms of the inner loops of these dot products, for example: let dot a b = let x = ref 0.0 in for i=0 to Array.length a - 1 do x := !x +. a.(i) *. b.(i) done; !x .L101: .L103: movlcaml_young_ptr, %eax subl$12, %eax movl%eax, caml_young_ptr cmplcaml_young_limit, %eax jb .L104 leal4(%eax), %eax movl$2301, -4(%eax) fldl-4(%edi, %ecx, 4) fmull -4(%ebx, %ecx, 4) faddl (%esi) fstpl (%eax) movl%eax, %esi movl%ecx, %eax addl$2, %ecx cmpl%edx, %eax jne .L101 let dot2 a b = let x = ref 0.0 in for i=0 to Array.length a - 1 do x := !x +. a.(i) *. b.(i) done; 1.0 *. !x .L107: fldl-4(%eax, %ecx, 4) fmull -4(%ebx, %ecx, 4) faddl 0(%esp) fstpl 0(%esp) movl%ecx, %edx addl$2, %ecx cmpl%esi, %edx jne .L107 In the latter case, x is unboxed into a register. Come to think of it, I wonder if it would be better to implement ref on top of a single-cell array, since then everyone would get the float unboxing whenever applicable. I imagine there is some runtime overhead to this, though. All-float (including one-float) records are unboxed anyway. Boxing was discussed in the book OCaml for Scientists and the OCaml Journal articles about optimization and the SciMark2 benchmark. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs