Hi, thanks for the pointers on the ffi. "I strongly suspect that SML functions like IntArray.update take less time than the FFI overhead, so no improvement is possible by using C functions."
Yes it seems so. I was hoping that I could somehow achieve the extremely low overheads that I get when using the haskell ffi (because I think ghc haskell compiler also uses libffi ; https://github.com/ghc/ghc/commit/39e206a7badd18792a7c8159cff732c63a9b19e7) but it is not obvious how I can do that. It seems that "val previous = IntArray.sub(data,i-1)" and "IntArray.update(data,i,use+1);" are bottlenecks. For example the following program : ********************************************************** ********************************************************** ********************************************************** val size:int = 50000; val loops:int = 30000; val cap:int = 50000; val data = IntArray.array(size,0); fun loop () = let fun loopI i = if i = size then let val _ = () in IntArray.update(data,0,IntArray.sub(data,size-1)); () end else let val previous = IntArray.sub(data,i-1) val use = if previous > cap then 0 else previous in IntArray.update(data,i,use+1); loopI (i+1) end in loopI 1 end fun benchmarkRun () = let fun bench i = if i = loops then () else let val _ = () in loop (); bench (i+1) end in bench 1 end fun sum (i,value) = if i = size then value else sum(i+1,value+Array.sub(data,i)) fun main () = let val _ = () in benchmarkRun(); print (Int.toString (sum (0,0))); print "\n" end ********************************************************** ********************************************************** ********************************************************** takes about 52 seconds. Now if I modify "loop" (by commenting out "val previous = IntArray.sub(data,i-1)" and "IntArray.update(data,i,use+1);") to : ********************************************************** ********************************************************** ********************************************************** fun loop () = let fun loopI i = if i = size then let val _ = () in IntArray.update(data,0,IntArray.sub(data,size-1)); () end else let (*val previous = IntArray.sub(data,i-1)*) val previous = 0 val use = if previous > cap then 0 else previous in (*IntArray.update(data,i,use+1);*) loopI (i+1) end in loopI 1 end ********************************************************** ********************************************************** ********************************************************** then the program takes 8 seconds. On Sun, Sep 20, 2015 at 5:33 PM, Phil Clayton <[email protected]> wrote: > Poly/ML is using libffi to call C functions. To determine the FFI > overhead, you could create an example that just makes the same number of > calls to empty C functions with the same number of arguments. > > To see what happens when a C function is called, look at call_sym function > in foreign.cpp: > https://github.com/polyml/polyml/blob/master/libpolyml/foreign.cpp#L874 > For a C function to improve efficiency, the C function needs to be faster > by more than the FFI overhead. I strongly suspect that SML functions like > IntArray.update take less time than the FFI overhead, so no improvement is > possible by using C functions. (I suspect such simple SML functions take > much less time, so I would expect use of C functions to be much slower.) > > Phil > > P.S. I think that there is scope for efficiency improvement in Poly/ML but > with some upheaval. For example, if call_sym first took parameters that > indicate the types of arguments and the return value, then, for each call > site, arg_values and arg_types could be created once and the call to > ffi_prep_cif made once. Still, the arguments have to be filled in on each > call: > PolyWord p = arg_list; > for (POLYUNSIGNED i=0; i<num_args; i++,p=Tail(p)) > { > arg_values[i] = DEREFVOL(taskData, Head(p).AsObjPtr()->Get(1)); > arg_types[i] = ctypeToFfiType(taskData, > Head(p).AsObjPtr()->Get(0)); > > } > > > 19/09/15 16:17, Artella Coding wrote: > >> Hi, I thought that I would try to speed up the SML code at >> >> http://stackoverflow.com/questions/32425267/how-to-improving-array-benchmark-performance-in-polyml >> by using the FFI, but this results in significant slowdown. >> >> Non ffi code : >> >> ************************************************************************* >> ************************************************************************* >> ************************************************************************* >> >> val size:int = 50000; >> val loops:int = 30; >> val cap:int = 50000; >> >> val data = IntArray.array(size,0); >> >> fun loop () = >> let >> fun loopI i = >> if i = size then >> let val _ = () in >> IntArray.update(data,0,IntArray.sub(data,size-1)); >> () >> end >> else >> let val previous = IntArray.sub(data,i-1) >> val use = if previous > cap then 0 else previous in >> IntArray.update(data,i,use+1); >> loopI (i+1) >> end >> in loopI 1 end >> >> fun benchmarkRun () = >> let >> fun bench i = >> if i = loops then () >> else let val _ = () in >> loop (); >> bench (i+1) >> end >> in bench 1 end >> >> fun sum (i,value) = >> if i = size then value >> else sum(i+1,value+Array.sub(data,i)) >> >> fun main () = let val _ = () in >> benchmarkRun(); >> print (Int.toString (sum (0,0))); >> print "\n" >> end >> >> (*val _ = main ()*) >> >> ************************************************************************* >> ************************************************************************* >> ************************************************************************* >> >> FFI code : >> >> c code : >> >> ************************************************************************* >> ************************************************************************* >> ************************************************************************* >> >> //intArray.c >> #include <stdlib.h> >> #include <stdio.h> >> >> typedef struct _intArray { >> int size; >> int* arr; >> } intArray; >> >> intArray* createIntArray(int size){ >> int i; >> intArray* p = (intArray*) malloc (sizeof(intArray)); >> p->arr = (int*) malloc (size*sizeof(int)); >> for(i=0; i<size; i++){ >> p->arr[i] = 0; >> } >> p->size = size; >> return p; >> } >> >> void destroyIntArray(intArray* p){ >> free (p->arr); >> free (p); >> } >> >> void setIntArray(intArray* p, int elem, int val){ >> p->arr[elem] = val; >> } >> >> int getIntArray(intArray *p, int elem){ >> return p->arr[elem]; >> } >> >> int getSumIntArray(intArray* p){ >> int sum = 0; >> int i; >> int size = p->size; >> for(i=0; i<size; i++){ >> sum += p->arr[i]; >> } >> return sum; >> } >> >> ************************************************************************* >> ************************************************************************* >> ************************************************************************* >> >> ml code : >> >> ************************************************************************* >> ************************************************************************* >> ************************************************************************* >> >> open CInterface; >> >> val lib = load_lib "./intArray.so"; >> val get = get_sym "./intArray.so"; >> >> val PINTARR = POINTER; >> >> val c1 = call1 (get "createIntArray") INT PINTARR >> val c2 = call3 (get "setIntArray") (PINTARR,INT,INT) VOID >> val c3 = call2 (get "getIntArray") (PINTARR,INT) INT >> val c4 = call1 (get "getSumIntArray") (PINTARR) INT >> >> fun c_createIntArray (size) = c1 (size); >> fun c_setIntArray (p,elem,value) = c2 (p,elem,value); >> fun c_getIntArray (p,elem) = c3 (p,elem); >> fun c_getSumIntArray (p) = c4 (p); >> >> >> val size:int = 50000; >> val loops:int = 30; >> val cap:int = 50000; >> >> fun loop (pData2) = >> let >> fun loopI i = >> if i = size then >> let val _ = () in >> c_setIntArray(pData2,0,c_getIntArray(pData2,size-1)); >> () >> end >> else >> let >> val previous = c_getIntArray(pData2,i-1); >> val use = if previous > cap then 0 else previous in >> c_setIntArray(pData2,i,use+1); >> loopI (i+1) >> end >> in loopI 1 end >> >> fun benchmarkRun (pData2) = >> let >> fun bench i = >> if i = loops then () >> else let val _ = () in >> loop (pData2); >> bench (i+1) >> end >> in bench 1 end >> >> fun main () = >> let >> val pData = c_createIntArray(size); >> val final = load_sym lib "destroyIntArray"; >> in >> setFinal final pData; >> benchmarkRun(pData); >> print (Int.toString (c_getSumIntArray (pData))); >> print "\n" >> end >> >> ************************************************************************* >> ************************************************************************* >> ************************************************************************* >> >> The times are : >> >> a)for non ffi sml : 0.09s >> b)for ffi sml : 11.8s >> >> Is there any way I can improve the speeds on the ffi code? Thanks >> >> >> _______________________________________________ >> polyml mailing list >> [email protected] >> http://lists.inf.ed.ac.uk/mailman/listinfo/polyml >> >> > _______________________________________________ > polyml mailing list > [email protected] > http://lists.inf.ed.ac.uk/mailman/listinfo/polyml >
_______________________________________________ polyml mailing list [email protected] http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
