Hi, thanks for the pointers on the ffi.

"I strongly suspect that SML functions like IntArray.update take less time
than the FFI overhead, so no improvement is possible by using C functions."

Yes it seems so. I was hoping that I could somehow achieve the extremely
low overheads that I get when using the haskell ffi (because I think ghc
haskell compiler also uses libffi ;
https://github.com/ghc/ghc/commit/39e206a7badd18792a7c8159cff732c63a9b19e7)
but it is not obvious how I can do that.

It seems that "val previous = IntArray.sub(data,i-1)" and
"IntArray.update(data,i,use+1);" are bottlenecks. For example the following
program :


**********************************************************
**********************************************************
**********************************************************

val size:int = 50000;
val loops:int = 30000;
val cap:int = 50000;

val data = IntArray.array(size,0);

fun loop () =
  let
    fun loopI i =
      if i = size then
        let val _ = () in
          IntArray.update(data,0,IntArray.sub(data,size-1));
          ()
        end
      else
        let val previous = IntArray.sub(data,i-1)
            val use = if previous > cap then 0 else previous in
          IntArray.update(data,i,use+1);
          loopI (i+1)
      end
  in loopI 1 end

fun benchmarkRun () =
  let
    fun bench i =
      if i = loops then ()
      else let val _ = () in
             loop ();
             bench (i+1)
           end
  in bench 1 end

fun sum (i,value) =
  if i = size then value
  else sum(i+1,value+Array.sub(data,i))

fun main () = let val _ = () in
  benchmarkRun();
  print (Int.toString (sum (0,0)));
  print "\n"
  end

**********************************************************
**********************************************************
**********************************************************


takes about 52 seconds. Now if I modify "loop" (by commenting out "val
previous = IntArray.sub(data,i-1)" and "IntArray.update(data,i,use+1);") to
:


**********************************************************
**********************************************************
**********************************************************

fun loop () =
  let
    fun loopI i =
      if i = size then
        let val _ = () in
          IntArray.update(data,0,IntArray.sub(data,size-1));
          ()
        end
      else
        let (*val previous = IntArray.sub(data,i-1)*)
            val previous = 0
            val use = if previous > cap then 0 else previous in
          (*IntArray.update(data,i,use+1);*)
          loopI (i+1)
      end
  in loopI 1 end

**********************************************************
**********************************************************
**********************************************************


then the program takes 8 seconds.





On Sun, Sep 20, 2015 at 5:33 PM, Phil Clayton <[email protected]>
wrote:

> Poly/ML is using libffi to call C functions.  To determine the FFI
> overhead, you could create an example that just makes the same number of
> calls to empty C functions with the same number of arguments.
>
> To see what happens when a C function is called, look at call_sym function
> in foreign.cpp:
> https://github.com/polyml/polyml/blob/master/libpolyml/foreign.cpp#L874
> For a C function to improve efficiency, the C function needs to be faster
> by more than the FFI overhead.  I strongly suspect that SML functions like
> IntArray.update take less time than the FFI overhead, so no improvement is
> possible by using C functions.  (I suspect such simple SML functions take
> much less time, so I would expect use of C functions to be much slower.)
>
> Phil
>
> P.S. I think that there is scope for efficiency improvement in Poly/ML but
> with some upheaval.  For example, if call_sym first took parameters that
> indicate the types of arguments and the return value, then, for each call
> site, arg_values and arg_types could be created once and the call to
> ffi_prep_cif made once.  Still, the arguments have to be filled in on each
> call:
>     PolyWord p = arg_list;
>     for (POLYUNSIGNED i=0; i<num_args; i++,p=Tail(p))
>     {
>         arg_values[i] = DEREFVOL(taskData, Head(p).AsObjPtr()->Get(1));
>         arg_types[i] = ctypeToFfiType(taskData,
> Head(p).AsObjPtr()->Get(0));
>
>     }
>
>
> 19/09/15 16:17, Artella Coding wrote:
>
>> Hi, I thought that I would try to speed up the SML code at
>>
>> http://stackoverflow.com/questions/32425267/how-to-improving-array-benchmark-performance-in-polyml
>> by using the FFI, but this results in significant slowdown.
>>
>> Non ffi code :
>>
>> *************************************************************************
>> *************************************************************************
>> *************************************************************************
>>
>> val size:int = 50000;
>> val loops:int = 30;
>> val cap:int = 50000;
>>
>> val data = IntArray.array(size,0);
>>
>> fun loop () =
>>    let
>>      fun loopI i =
>>        if i = size then
>>          let val _ = () in
>>            IntArray.update(data,0,IntArray.sub(data,size-1));
>>            ()
>>          end
>>        else
>>          let val previous = IntArray.sub(data,i-1)
>>              val use = if previous > cap then 0 else previous in
>>            IntArray.update(data,i,use+1);
>>            loopI (i+1)
>>        end
>>    in loopI 1 end
>>
>> fun benchmarkRun () =
>>    let
>>      fun bench i =
>>        if i = loops then ()
>>        else let val _ = () in
>>               loop ();
>>               bench (i+1)
>>             end
>>    in bench 1 end
>>
>> fun sum (i,value) =
>>    if i = size then value
>>    else sum(i+1,value+Array.sub(data,i))
>>
>> fun main () = let val _ = () in
>>    benchmarkRun();
>>    print (Int.toString (sum (0,0)));
>>    print "\n"
>>    end
>>
>> (*val _ = main ()*)
>>
>> *************************************************************************
>> *************************************************************************
>> *************************************************************************
>>
>> FFI code :
>>
>> c code :
>>
>> *************************************************************************
>> *************************************************************************
>> *************************************************************************
>>
>> //intArray.c
>> #include <stdlib.h>
>> #include <stdio.h>
>>
>> typedef struct _intArray {
>>    int size;
>>    int* arr;
>> } intArray;
>>
>> intArray* createIntArray(int size){
>>    int i;
>>    intArray* p = (intArray*) malloc (sizeof(intArray));
>>    p->arr = (int*) malloc (size*sizeof(int));
>>    for(i=0; i<size; i++){
>>      p->arr[i] = 0;
>>    }
>>    p->size = size;
>>    return p;
>> }
>>
>> void destroyIntArray(intArray* p){
>>    free (p->arr);
>>    free (p);
>> }
>>
>> void setIntArray(intArray* p, int elem, int val){
>>    p->arr[elem] = val;
>> }
>>
>> int getIntArray(intArray *p, int elem){
>>    return p->arr[elem];
>> }
>>
>> int getSumIntArray(intArray* p){
>>    int sum = 0;
>>    int i;
>>    int size = p->size;
>>    for(i=0; i<size; i++){
>>      sum += p->arr[i];
>>    }
>>    return sum;
>> }
>>
>> *************************************************************************
>> *************************************************************************
>> *************************************************************************
>>
>> ml code :
>>
>> *************************************************************************
>> *************************************************************************
>> *************************************************************************
>>
>> open CInterface;
>>
>> val lib = load_lib "./intArray.so";
>> val get = get_sym "./intArray.so";
>>
>> val PINTARR = POINTER;
>>
>> val c1 = call1 (get "createIntArray") INT PINTARR
>> val c2 = call3 (get "setIntArray") (PINTARR,INT,INT) VOID
>> val c3 = call2 (get "getIntArray") (PINTARR,INT) INT
>> val c4 = call1 (get "getSumIntArray") (PINTARR) INT
>>
>> fun c_createIntArray (size) = c1 (size);
>> fun c_setIntArray (p,elem,value) = c2 (p,elem,value);
>> fun c_getIntArray (p,elem) = c3 (p,elem);
>> fun c_getSumIntArray (p) = c4 (p);
>>
>>
>> val size:int = 50000;
>> val loops:int = 30;
>> val cap:int = 50000;
>>
>> fun loop (pData2) =
>>    let
>>      fun loopI i =
>>        if i = size then
>>          let val _ = () in
>>            c_setIntArray(pData2,0,c_getIntArray(pData2,size-1));
>>            ()
>>          end
>>        else
>>          let
>>              val previous = c_getIntArray(pData2,i-1);
>>              val use = if previous > cap then 0 else previous in
>>            c_setIntArray(pData2,i,use+1);
>>            loopI (i+1)
>>        end
>>    in loopI 1 end
>>
>> fun benchmarkRun (pData2) =
>>    let
>>      fun bench i =
>>        if i = loops then ()
>>        else let val _ = () in
>>               loop (pData2);
>>               bench (i+1)
>>             end
>>    in bench 1 end
>>
>> fun main () =
>>    let
>>      val pData = c_createIntArray(size);
>>      val final = load_sym lib "destroyIntArray";
>>    in
>>    setFinal final pData;
>>    benchmarkRun(pData);
>>    print (Int.toString (c_getSumIntArray (pData)));
>>    print "\n"
>>    end
>>
>> *************************************************************************
>> *************************************************************************
>> *************************************************************************
>>
>> The times are :
>>
>> a)for non ffi sml : 0.09s
>> b)for ffi sml : 11.8s
>>
>> Is there any way I can improve the speeds on the ffi code? Thanks
>>
>>
>> _______________________________________________
>> polyml mailing list
>> [email protected]
>> http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
>>
>>
> _______________________________________________
> polyml mailing list
> [email protected]
> http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
>
_______________________________________________
polyml mailing list
[email protected]
http://lists.inf.ed.ac.uk/mailman/listinfo/polyml

Reply via email to