Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-11-14 Thread Liam Healy
OK, I've now committed what I think is my last attempt at this.  This
latest version includes optimization of matrices and setting values in
foreign arrays.  This latter comes with a caveat however.  The
compiler will macroexpand setf, and for SBCL at least, that means
making temporary variables to bind the actual values.  Those variables
of course don't have any declarations, so no compiler macro expansion
is done.  They way around this is to funcall #'(setf gref*) instead of
using the setf macro, something that I bet most people won't want to
do.  If I get some more energy to pursue this, I could define a setf
macro to shadow cl:setf, but I think I'll let this be for now.

I've written a "timing test" function loosely based on yours.  It is
in foreign-array/tests/timing.lisp.  I hope in the next few days to
run a few of these tests to see the benefit of the optimizations that
have been put in place.  You can see I have commented out the funcall
#'(setf grid:gref*) form at the end; in this example there are so few
sets that the speed difference is undetectable.

Liam

___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-11-14 Thread Liam Healy
On Sun, Nov 14, 2010 at 10:24 AM, Sebastian Sturm
 wrote:
> My apologies for taking so long to answer. I have verified that the new gref* 
> can now be used without consing and comes relatively close to the 
> unconvenient cffi:mem-aref solution in terms of computation time. Many thanks 
> for that! I also tried the functions on matrices, which still results in some 
> consing, but this doesn't seem to be time-critical here. In my experiments, 
> most of the consing could be eliminated by storing the linearized index in an 
> auxiliary variable, but this even resulted in a slight runtime increase.
> For my original application, the GSLL solution is now comparable in speed to 
> Mathematica for small dimensionality (dim < about 50); for a fair comparison 
> of both solutions for larger dim, I'll have to rework my very naive 
> computation of the Hessian matrix.
>
> Judging from the git commits, you have also changed something related to 
> complex-valued arrays. Did you implement similar optimizations as in the 
> real-valued case? I'll try that once I have repaired my Hessian.
>
> Best regards,
> Sebastian

The work is still ongoing.  I don't know when you last pulled, but
I've made a few commits in the last day or so, and I'm working on some
more improvements.  My previous email was incorrect in stating that
the 'declare form doesn't work; it does at least in newer versions of
SBCL.  Also, I've expanded use to CCL because that supports
variable-information as well (this hasn't been committed yet).   The
other recent change is that inputs and outputs of gref* and (setf
gref*) are now wrapped with the appropriate 'the forms to help the
compiler optimize.   So far these mostly apply only to vectors (1
dimensional arrays) but I am working on compile-time linearization of
indices for higher dimensional arrays, so that at compile time the
forms will also expand directly into mem-aref calls; that should help
with your matrices if they are declared with their dimensions.  I'm
working on this now.   Everything should apply to all foreign array
types, including complex.

Liam

___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-11-14 Thread Sebastian Sturm
My apologies for taking so long to answer. I have verified that the new gref* 
can now be used without consing and comes relatively close to the unconvenient 
cffi:mem-aref solution in terms of computation time. Many thanks for that! I 
also tried the functions on matrices, which still results in some consing, but 
this doesn't seem to be time-critical here. In my experiments, most of the 
consing could be eliminated by storing the linearized index in an auxiliary 
variable, but this even resulted in a slight runtime increase.
For my original application, the GSLL solution is now comparable in speed to 
Mathematica for small dimensionality (dim < about 50); for a fair comparison of 
both solutions for larger dim, I'll have to rework my very naive computation of 
the Hessian matrix.

Judging from the git commits, you have also changed something related to 
complex-valued arrays. Did you implement similar optimizations as in the 
real-valued case? I'll try that once I have repaired my Hessian.

Best regards,
Sebastian
___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-11-07 Thread Liam Healy
OK, this turned out to be a lot harder than I thought.  I have done
three things:
1) I have defined gref* and (setf gref*) methods specific to each of
the foreign-array types that call cffi:mem-aref.  This gives about 20x
speedup because I am now passing the literal type to cffi:mem-aref, so
it can work that fact in at compile time.
2) There is now a compiler macro to turn a grid:gref into a grid:gref*
if there's only one index.  This gives about 2x speed up when gref is
used.
3) There is another compiler macro that turns a grid:gref* into a
cffi:mem-aref directly if the foreign array is declared.  This gives
about a 400x speedup overall, similar to your "hardwired" result.  It
is a bit slower because I'm not able to precompute the pointer, it has
to be recomputed each time on the gref* call.

On the last point, there is a caveat.  I tried to make it work when
the foreign array has been declared with a standard (declare ...)
form.  This has a chance of working on SBCL because of its support for
the CLtL2 function variable-information, which was removed from CL
before it was sent to ANSI standardization.  However, it did not work
for me; I will continue to try to get this working.  In the meantime,
the only way to do a declaration to take advantage of 3 is with a 'the
form, e.g. (grid:gref (the vector-double-float zvector) i).  This is
kind of annoying, but it is portable, and allows you to avoid going to
lower level functions (i.e., cffi:mem-aref).

So for example see my rewrite of your function (in
foreign-array/tests/fast-array-access.lisp)
(defun gref-access (dim)
  "Given an integer dim, this constructs a function that, when supplied with a
   N-dimensional vector Z and some output vector (-> pointer?), yields the
   corresponding forces"
  (let ((temp-values (make-array 2 :element-type 'double-float
:initial-element 0.0d0)))
(lambda (zvector output)
  (declare (fixnum dim)
   (optimize (speed 3) (safety 0) (debug 0))
   (type vector-double-float zvector)) ;;; <--- this is useless,
but ought not to be!
  (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i))
(setf (aref temp-values 0) 0.0d0)
(do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m))
  (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n))
(setf (aref temp-values 1) 0.0d0)
(do ((k m (1+ k))) ((> k n)) (declare (fixnum k))
  (incf (aref temp-values 1) (grid:gref (the vector-double-float
zvector) k))) ; This declaration does the work!
(incf (aref temp-values 0) (expt (aref temp-values 1) -2
(setf (grid:gref output i)
  (- (grid:gref (the vector-double-float zvector) i) ; This one 
does too!
 (aref temp-values 0)))
is now (almost) as fast as your cffi-access.

There is still a bunch of stuff to be done --- the optimizations only
work for vectors, not higher dimensional arrays, and I haven't defined
a compiler macro for setf yet on 3).  But it's a start; try it in your
problem and let me know how it performs.

Liam

___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-10-29 Thread Sebastian Sturm
Liam,

I'm sorry, I now see a difference between the original gref* and your modified 
version of it. I guess I forgot to reload the Lisp image last time, so both 
timings had been obtained using the new gref*. There is still consing, but it's 
already reduced by a factor of three. Can be reduced by a further ~60% by using 
(the double-float (grid:gref* ...)) instead of just gref*; I assume this is due 
to to the float->pointer coercion done to gref*'s .(?) Here's the 
new timing data (dim = 15):

"gref" 
Evaluation took:
 0.063 seconds of real time
 0.056171 seconds of total run time (0.044926 user, 0.011245 system)
 [ Run times consist of 0.006 seconds GC time, and 0.051 seconds non-GC time. ]
 88.89% CPU
 28 lambdas converted
 137,624,278 processor cycles
 4,032,960 bytes consed


"gref*" 
Evaluation took:
 0.012 seconds of real time
 0.011932 seconds of total run time (0.011929 user, 0.03 system)
 100.00% CPU
 26,178,823 processor cycles
 785,360 bytes consed


"modified gref*" 
Evaluation took:
 0.001 seconds of real time
 0.001202 seconds of total run time (0.000884 user, 0.000318 system)
 100.00% CPU
 3,126,090 processor cycles
 278,512 bytes consed


"hardwired cffi:mem-aref" 
Evaluation took:
 0.000 seconds of real time
 0.33 seconds of total run time (0.32 user, 0.01 system)
 100.00% CPU
 66,462 processor cycles
 0 bytes consed

I have attached the lisp file used to obtain these timings.

thanks,
Sebastian



fast-array-access.lisp
Description: Binary data
___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-10-28 Thread Liam Healy
Sebastian,

Trying to walk code correctly is a nasty business, and generally
non-portable, so I'd stay away from it.   We could do this with a
macro/macrolet pair but I'd like to optimize via the CLOS route if we
can.

Liam

On Thu, Oct 28, 2010 at 1:01 PM, Sebastian Sturm
 wrote:
> Here is something similar to what I suggested, although still very 
> unfinished. It's probably not the proper way to write a macro, but I guess 
> other people on the list will know how to do it correctly. Efficient 
> linearization can be added in the same manner; also, the macro should be 
> modified s.t. it accepts several array specifications at once, i.e. 
> with-foreign-array ((array-1 :double) (array-2 :int) (array-3 :double) ...), 
> etc.
> Of course, if you can figure out a way to incorporate the optimizations into 
> gref without such a clumsy workaround, I'd be all for it.
> best regards,
> Sebastian
>
> (defun mapcons (fn x)
>  (if (atom x)
>     x
>     (funcall fn (let ((a (mapcons fn (car x)))
>                        (d (mapcons fn (cdr x
>                    (if (and (eql a (car x)) (eql d (cdr x)))
>                        x
>                        (cons a d))
>
> (defmacro with-fast-access-to-single-foreign-array ((array element-type) 
> &body body)
>  (alexandria:with-unique-names (array-fptr)
>   `(let ((,array-fptr (grid::foreign-pointer ,array)))
>      ,@(mapcons
>          (lambda (expr)
>            (if (and (consp expr)
>                     (eq (first expr) 'grid:gref*)
>                     (eq (second expr) array))
>                (list 'cffi:mem-aref array-fptr element-type (elt expr 2))
>                expr)) body
>
> (defun macro-force-function (dim)
>  "Given an integer dim, this constructs a function that, when supplied with a
>  N-dimensional vector Z and some output vector (-> pointer?), yields the
>  corresponding forces"
>  (declare (fixnum dim))
>  (let ((temp-values (make-array 2 :element-type 'double-float 
> :initial-element 0.0d0)))
>   (lambda (zvector output)
>     (with-fast-access-to-single-foreign-array (output :double)
>        (with-fast-access-to-single-foreign-array (zvector :double)
>          (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i))
>            (setf (aref temp-values 0) 0.0d0)
>            (do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m))
>              (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n))
>                (setf (aref temp-values 1) 0.0d0)
>                (do ((k m (1+ k))) ((> k n)) (declare (fixnum k))
>                  (incf (aref temp-values 1) (grid:gref* zvector k)))
>                (incf (aref temp-values 0) (expt (aref temp-values 1) -2
>            (setf (grid:gref* output i)
>                  (- (grid:gref* zvector i)
>                     (aref temp-values 0)
>
>
> ___
> GSLL-devel mailing list
> GSLL-devel@common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel
>

___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-10-28 Thread Liam Healy
Sebastian,

I'm very puzzled by this.  You should now have the type :double
hard-wired in the CFFI call.  Can you run your big test with and
without this function defined, instead of just the grid:gref* test?
If it still shows no improvement, that says that the problem is not in
the binding of type at runtime in the cffi:mem-aref call.  Or if you'd
like, send me the test (I know you posted pieces before but just to be
sure, send the whole file again) and I will try it.

Liam

On Thu, Oct 28, 2010 at 10:51 AM, Sebastian Sturm
 wrote:
> Liam,
>
> I have tried this and (without using (the double-float (...)) since this is 
> unnecessary for cffi:mem-aref with hardcoded :double data type), I get the 
> following timing data:
>
> using the standard grid:gref routine:
> Evaluation took:
> 0.542 seconds of real time
> 0.536671 seconds of total run time (0.525229 user, 0.011442 system)
> [ Run times consist of 0.037 seconds GC time, and 0.500 seconds non-GC time. ]
> 99.08% CPU
> 1,189,729,211 processor cycles
> 153,334,416 bytes consed
>
> using the standard grid:gref* routine:
> Evaluation took:
> 0.067 seconds of real time
> 0.066006 seconds of total run time (0.063179 user, 0.002827 system)
> [ Run times consist of 0.008 seconds GC time, and 0.059 seconds non-GC time. ]
> 98.51% CPU
> 146,192,222 processor cycles
> 27,060,416 bytes consed
>
> using the modified grid:gref* routine (specialized to 
> grid:vector-double-float)
> Evaluation took:
> 0.070 seconds of real time
> 0.068041 seconds of total run time (0.065151 user, 0.002890 system)
> [ Run times consist of 0.011 seconds GC time, and 0.058 seconds non-GC time. ]
> 97.14% CPU
> 152,642,611 processor cycles
> 27,061,456 bytes consed
>
> Apparently my system didn't notice the difference. Also, SBCL complains about 
> the argument (grid::foreign-pointer object) to cffi:mem-aref being of type 
> NUMBER instead of integer or fixnum. Furthermore, it says it has to do float 
> to pointer coercion to . I checked that the redefined method is 
> actually used by doing a second run with a print statement added to the 
> defmethod.
>
> using cffi:mem-aref directly:
> Evaluation took:
> 0.002 seconds of real time
> 0.002483 seconds of total run time (0.002482 user, 0.01 system)
> 100.00% CPU
> 5,442,756 processor cycles
> 0 bytes consed
>
> I guess that even using compiler macros or other trickery one would have to 
> remove the allocation of linearized indices and foreign ptr addresses from 
> the inner loops as I have done in my example by using auxiliary variables 
> zvector-fptr and output-fptr. Maybe one can define something like
> (with-foreign-array (name-of-array :double) ...) that locally redefines 
> (grid:gref name-of-array ...) and (grid:gref* name-of-array ...) as macros 
> evaluating to cffi:mem-aref and storing the respective linearized indices and 
> memory pointers at the level of with-foreign-array? Although not as 
> convenient as some 'self-optimizing' grid:gref, I would consider this a 
> satisfactory solution. Don't know how to do that without getting lost in a 
> forest of commas and backquotes, though.
>
> best regards,
> Sebastian
>
> On 27.10.2010, at 05:22, Liam Healy wrote:
>
>> Sebastian,
>>
>> Can you temporarily define this and find the timing/consing for your
>> test case:
>>
>> (defmethod gref* ((object vector-double-float) linearized-index)
>> (cffi:mem-aref
>> (foreign-pointer object)
>> :double
>> linearized-index))
>>
>> (I think you don't use any matrices but if you do, define an analogous
>> function for matrix-double-float.)
>>
>> As you can see, it has the literal type declaration, and I'm hopeful
>> that CFFI will pick that up and make this competitive in speed with
>> the best that you saw.  If that's so, it should be fairly easy for me
>> to make this generic and incorporate it into GSD.  I'm still
>> interested in making the linearization more efficient if that's still
>> significant, but let's try this for now to see how much speed we can
>> squeeze out of gref*.
>>
>> Thanks,
>>
>> Liam
>
> ___
> GSLL-devel mailing list
> GSLL-devel@common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel
>

___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-10-28 Thread Sebastian Sturm
Here is something similar to what I suggested, although still very unfinished. 
It's probably not the proper way to write a macro, but I guess other people on 
the list will know how to do it correctly. Efficient linearization can be added 
in the same manner; also, the macro should be modified s.t. it accepts several 
array specifications at once, i.e. with-foreign-array ((array-1 :double) 
(array-2 :int) (array-3 :double) ...), etc.
Of course, if you can figure out a way to incorporate the optimizations into 
gref without such a clumsy workaround, I'd be all for it.
best regards,
Sebastian

(defun mapcons (fn x) 
 (if (atom x) 
 x 
 (funcall fn (let ((a (mapcons fn (car x))) 
(d (mapcons fn (cdr x 
(if (and (eql a (car x)) (eql d (cdr x))) 
x 
(cons a d))

(defmacro with-fast-access-to-single-foreign-array ((array element-type) &body 
body)
 (alexandria:with-unique-names (array-fptr)
   `(let ((,array-fptr (grid::foreign-pointer ,array)))
  ,@(mapcons
  (lambda (expr)
(if (and (consp expr)
 (eq (first expr) 'grid:gref*)
 (eq (second expr) array))
(list 'cffi:mem-aref array-fptr element-type (elt expr 2))
expr)) body

(defun macro-force-function (dim)
 "Given an integer dim, this constructs a function that, when supplied with a 
  N-dimensional vector Z and some output vector (-> pointer?), yields the
  corresponding forces"
 (declare (fixnum dim))
 (let ((temp-values (make-array 2 :element-type 'double-float :initial-element 
0.0d0)))
   (lambda (zvector output)
 (with-fast-access-to-single-foreign-array (output :double)
(with-fast-access-to-single-foreign-array (zvector :double) 
  (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i))
(setf (aref temp-values 0) 0.0d0)
(do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m))
  (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n))
(setf (aref temp-values 1) 0.0d0)
(do ((k m (1+ k))) ((> k n)) (declare (fixnum k))
  (incf (aref temp-values 1) (grid:gref* zvector k)))
(incf (aref temp-values 0) (expt (aref temp-values 1) -2
(setf (grid:gref* output i) 
  (- (grid:gref* zvector i)
 (aref temp-values 0)


___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-10-28 Thread Sebastian Sturm
Liam,

I have tried this and (without using (the double-float (...)) since this is 
unnecessary for cffi:mem-aref with hardcoded :double data type), I get the 
following timing data:

using the standard grid:gref routine:
Evaluation took:
0.542 seconds of real time
0.536671 seconds of total run time (0.525229 user, 0.011442 system)
[ Run times consist of 0.037 seconds GC time, and 0.500 seconds non-GC time. ]
99.08% CPU
1,189,729,211 processor cycles
153,334,416 bytes consed

using the standard grid:gref* routine:
Evaluation took:
0.067 seconds of real time
0.066006 seconds of total run time (0.063179 user, 0.002827 system)
[ Run times consist of 0.008 seconds GC time, and 0.059 seconds non-GC time. ]
98.51% CPU
146,192,222 processor cycles
27,060,416 bytes consed

using the modified grid:gref* routine (specialized to grid:vector-double-float)
Evaluation took:
0.070 seconds of real time
0.068041 seconds of total run time (0.065151 user, 0.002890 system)
[ Run times consist of 0.011 seconds GC time, and 0.058 seconds non-GC time. ]
97.14% CPU
152,642,611 processor cycles
27,061,456 bytes consed

Apparently my system didn't notice the difference. Also, SBCL complains about 
the argument (grid::foreign-pointer object) to cffi:mem-aref being of type 
NUMBER instead of integer or fixnum. Furthermore, it says it has to do float to 
pointer coercion to . I checked that the redefined method is 
actually used by doing a second run with a print statement added to the 
defmethod.

using cffi:mem-aref directly:
Evaluation took:
0.002 seconds of real time
0.002483 seconds of total run time (0.002482 user, 0.01 system)
100.00% CPU
5,442,756 processor cycles
0 bytes consed

I guess that even using compiler macros or other trickery one would have to 
remove the allocation of linearized indices and foreign ptr addresses from the 
inner loops as I have done in my example by using auxiliary variables 
zvector-fptr and output-fptr. Maybe one can define something like
(with-foreign-array (name-of-array :double) ...) that locally redefines 
(grid:gref name-of-array ...) and (grid:gref* name-of-array ...) as macros 
evaluating to cffi:mem-aref and storing the respective linearized indices and 
memory pointers at the level of with-foreign-array? Although not as convenient 
as some 'self-optimizing' grid:gref, I would consider this a satisfactory 
solution. Don't know how to do that without getting lost in a forest of commas 
and backquotes, though.

best regards,
Sebastian

On 27.10.2010, at 05:22, Liam Healy wrote:

> Sebastian,
> 
> Can you temporarily define this and find the timing/consing for your
> test case:
> 
> (defmethod gref* ((object vector-double-float) linearized-index)
> (cffi:mem-aref
> (foreign-pointer object)
> :double
> linearized-index))
> 
> (I think you don't use any matrices but if you do, define an analogous
> function for matrix-double-float.)
> 
> As you can see, it has the literal type declaration, and I'm hopeful
> that CFFI will pick that up and make this competitive in speed with
> the best that you saw.  If that's so, it should be fairly easy for me
> to make this generic and incorporate it into GSD.  I'm still
> interested in making the linearization more efficient if that's still
> significant, but let's try this for now to see how much speed we can
> squeeze out of gref*.
> 
> Thanks,
> 
> Liam

___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-10-26 Thread Liam Healy
Sebastian,

Can you temporarily define this and find the timing/consing for your
test case:

(defmethod gref* ((object vector-double-float) linearized-index)
  (cffi:mem-aref
   (foreign-pointer object)
   :double
   linearized-index))

(I think you don't use any matrices but if you do, define an analogous
function for matrix-double-float.)

As you can see, it has the literal type declaration, and I'm hopeful
that CFFI will pick that up and make this competitive in speed with
the best that you saw.  If that's so, it should be fairly easy for me
to make this generic and incorporate it into GSD.  I'm still
interested in making the linearization more efficient if that's still
significant, but let's try this for now to see how much speed we can
squeeze out of gref*.

Thanks,

Liam


On Tue, Oct 26, 2010 at 10:25 AM, Sebastian Sturm
 wrote:
> It seems that CFFI includes some compiler macros that use type information
> supplied at compile time to generate more efficient code (got that from the
> cffi mailing
> list, http://www.mail-archive.com/cffi-de...@common-lisp.net/msg01154.html).
> In my case, I'm using this optimization by supplying :double to
> cffi:mem-aref. If I replace this by (cl-cffi (element-type zvector)), as is
> done internally by gref, then (again with dim = 50), better-force-function
> uses around 1.8 GCycles and conses 80 MB in the process, whereas the :double
> version needs ~ 8.6 MCycles, not consing anything. The slow-but-flexible
> version of better-force-function reads as follows:
> (defun better-force-function (dim)
>   "Given an integer dim, this constructs a function that, when supplied with
> a
>    N-dimensional vector Z and some output vector (-> pointer?), yields the
>    corresponding forces"
>   (declare (fixnum dim))
>   (let ((temp-values (make-array 2 :element-type 'double-float
> :initial-element 0.0d0)))
>     (lambda (zvector output)
>       (let ((zvector-fptr (grid::foreign-pointer zvector))
>    (output-fptr (grid::foreign-pointer output))
>    ;; this makes it worse
>    (elt-type (grid:cl-cffi (grid:element-type zvector)))
>    )
> (macrolet ((quick-ref (the-vector n)
>     `(cffi:mem-aref
>       ,(case the-vector
>  (zvector 'zvector-fptr)
>  (output 'output-fptr))
>      ;;  :double
>       elt-type ;; replace this by :double
>       ,n)))
>  (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i))
>    (setf (aref temp-values 0) 0.0d0)
>    (do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m))
>      (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n))
> (setf (aref temp-values 1) 0.0d0)
> (do ((k m (1+ k))) ((> k n)) (declare (fixnum k))
>  (incf (aref temp-values 1) (quick-ref zvector k))) ;; generates efficiency
> warnings when using elt-type
> (incf (aref temp-values 0) (expt (aref temp-values 1) -2
>    (setf (quick-ref output i)
>  (- (quick-ref zvector i)
>     (aref temp-values 0)
> Also, with the variable type left unspecified at compile time, the innermost
> loop generates efficiency warnings telling me that generic-+ needs to be
> used. Writing (the double-float (quick-ref zvector k)) removes these and
> slightly reduces the consing amount of the slow variant to ~ 63 MB. I still
> have to try the SLIME profiler though.
> thanks,
> Sebastian

___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel


Re: [GSLL-devel] [Gsll-devel] Efficient access to externally generated double-float arrays?

2010-10-25 Thread Liam Healy
On Fri, Oct 22, 2010 at 2:13 PM, Sebastian Sturm
 wrote:
>> I'm surprised about the grid:gref result.   Please run a profiler and
>> confirm that the time consumed is within grid:gref and its callees,
>
> I tried using sb-sprof, but as a complete CL newbie I had a hard time making 
> sense of its output and thus ended up ignoring the profiler and using 
> trial-and-error again. It seems that a fair amount of consing is due to the 
> index linearization. Simply replacing grid:gref by grid:gref* for 
> one-dimensional arrays reduces consing by a factor of 3 in my experiments. It 
> seems that the rest can be eliminated by specifying the data type (here it's 
> :double) at compile-time; incorporating this as an option to grid:gref via 
> some macro-trickery would probably uglify the GSLL framework somewhat (I 
> guess?), but I'd be willing to pay that price in return for an 
> order-of-magnitude speedup. Also, I haven't tried if similar problems arise 
> (and if similar workarounds are available) for the case of complex-valued 
> arrays.

Interesting.  It's plausible to me that linearization is causing the
problem; I think that there is room for improvement there.   Your
example with the mem-aref show that it's got to be that and/or generic
function dispatch, because as you can see on line 35 of
http://repo.or.cz/w/gsd.git/blob/e537bd9be551b90c9fbfb6020a6119f3df00d650:/foreign-array/methods.lisp,
gref* essentially just calls mem-aref.  So maybe if we can add
declarations and inlining, we can get the same effect with the current
code.

If you can't make sense of sb-prof and you use slime, you might try
the slime profiler:
http://lhealy.livejournal.com/8495.html
I find its output much clearer.

I'm not sure where you are putting a :double declaration to get that
speedup; can you post some example code?  It's possible that there is
some loss of effiicency in using CLOS (generic function dispatch) that
could be remedied by declarations or compiler macros.  That's beyond
my realm of expertise, but I'm willing to dig into it.

Liam

___
GSLL-devel mailing list
GSLL-devel@common-lisp.net
http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel