Re: [racket-dev] [plt] Push #27909: master branch updated
Removing the return value checking is in the works. It actually is removing all of the checks that would blame typed code, so higher order functions/datastructure get improvements too. It is actually functional the last time I checked, but lacking documentation which is what is holding up merging with mainline. https://github.com/plt/racket/pull/453 On Wed, Dec 11, 2013 at 7:57 PM, Robby Findler wrote: > I see that TR's type->contract returns > > (-> (flat-named-contract (quote Float) flonum?) (flat-named-contract (quote > Float) flonum?)) > > for the type (Float -> Float), but it could return > > (-> (flat-named-contract (quote Float) flonum?) any) > > which wouldn't do any result value checking (this being different from any/c > as the range of the arrow contract). > > Robby > > > On Wed, Dec 11, 2013 at 6:18 PM, Neil Toronto > wrote: >> >> On 12/11/2013 02:49 PM, Neil Toronto wrote: >>> >>> On 12/11/2013 01:55 PM, Stephen Bloch wrote: > > On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote: > >> numeric primitives implemented in Typed Racket are faster than the >> same primitives implemented in C. Whoa! How did that happen? >>> >>> >>> Whoa! That's not what I meant! O_o >>> >>> I said "we might be getting close" to that. I haven't tried porting a >>> numeric C primitive to TR yet, but I have a hunch that it'll still be >>> slower. I'll try one now and report what I find. >>> >>> Neil ⊥ >> >> >> I can't figure out why `flsinh' is faster to call from untyped Racket than >> `sinh'. All my tests with a Typed Racket `magnitude' show calls from untyped >> code are significantly slower, except in the one case that it computes >> Euclidean distance. That case is only twice as slow. >> >> I've attached the benchmark program. The `magnitude*' function is more or >> less a direct translation of `magnitude' from "number.c" into Typed Racket. >> Here's a summary of the results I get on my computer, in milliseconds, for 5 >> million calls from untyped Racket, by data type. >> >> >> Function Flonum Rational Fixnum Integer Float-Complex >> --- >> magnitude* 385 419 378 414 686 >> magnitude 59 44 40 40 390 >> >> >> The only one that's close in relative terms is Float-Complex. The others >> just call `abs'. The decompiled code doesn't show any inlining of >> `magnitude', so this comparison should be good. >> >> I'll bet checking the return value contract (which is unnecessary) is the >> main slowdown. It has to check for number of values. >> >> For comparison, here are the timings for running the benchmarks in TR with >> #:no-optimize: >> >> >> Function Flonum Rational Fixnum Integer Float-Complex >> --- >> magnitude* 45 70* 37 102* 318 >> magnitude 61 45 39 91* 394 >> >> * = unexpectedly high >> >> >> Here's what I understand from comparing the numbers: >> >> * Except for non-fixnum integers, calling `magnitude' in TR is just as >> fast as in untyped Racket. I have no idea why it would be slower on big >> integers. That's just weird. >> >> * Calling `abs' in Racket is faster than calling `scheme_abs' in C, >> except on rationals and big integers. >> >> * Operating on flonums in Typed Racket, using generic numeric functions, >> is faster than doing the same in C. >> >> Overall, it looks like the TR code is within the same order of magnitude >> (pun not intended) as the C code. I would love to try this benchmark with >> either 1) a `magnitude*' with an `AnyValues' return type; or 2) a contract >> boundary that doesn't check TR's return types for first-order functions. >> >> (I managed to make a `magnitude*' with type Number -> AnyValues, but TR >> couldn't make a contract for it.) >> >> Neil ⊥ >> >> >> _ >> Racket Developers list: >> http://lists.racket-lang.org/dev >> > > _ > Racket Developers list: > http://lists.racket-lang.org/dev > _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] [plt] Push #27909: master branch updated
I see that TR's type->contract returns (-> (flat-named-contract (quote Float) flonum?) (flat-named-contract (quote Float) flonum?)) for the type (Float -> Float), but it could return (-> (flat-named-contract (quote Float) flonum?) any) which wouldn't do any result value checking (this being different from any/c as the range of the arrow contract). Robby On Wed, Dec 11, 2013 at 6:18 PM, Neil Toronto wrote: > > On 12/11/2013 02:49 PM, Neil Toronto wrote: >> >> On 12/11/2013 01:55 PM, Stephen Bloch wrote: On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote: > numeric primitives implemented in Typed Racket are faster than the > same primitives implemented in C. >>> >>> >>> Whoa! How did that happen? >> >> >> Whoa! That's not what I meant! O_o >> >> I said "we might be getting close" to that. I haven't tried porting a >> numeric C primitive to TR yet, but I have a hunch that it'll still be >> slower. I'll try one now and report what I find. >> >> Neil ⊥ > > > I can't figure out why `flsinh' is faster to call from untyped Racket than `sinh'. All my tests with a Typed Racket `magnitude' show calls from untyped code are significantly slower, except in the one case that it computes Euclidean distance. That case is only twice as slow. > > I've attached the benchmark program. The `magnitude*' function is more or less a direct translation of `magnitude' from "number.c" into Typed Racket. Here's a summary of the results I get on my computer, in milliseconds, for 5 million calls from untyped Racket, by data type. > > > Function Flonum Rational Fixnum Integer Float-Complex > --- > magnitude* 385 419 378 414 686 > magnitude 59 44 40 40 390 > > > The only one that's close in relative terms is Float-Complex. The others just call `abs'. The decompiled code doesn't show any inlining of `magnitude', so this comparison should be good. > > I'll bet checking the return value contract (which is unnecessary) is the main slowdown. It has to check for number of values. > > For comparison, here are the timings for running the benchmarks in TR with #:no-optimize: > > > Function Flonum Rational Fixnum Integer Float-Complex > --- > magnitude* 45 70* 37 102* 318 > magnitude 61 45 39 91* 394 > > * = unexpectedly high > > > Here's what I understand from comparing the numbers: > > * Except for non-fixnum integers, calling `magnitude' in TR is just as fast as in untyped Racket. I have no idea why it would be slower on big integers. That's just weird. > > * Calling `abs' in Racket is faster than calling `scheme_abs' in C, except on rationals and big integers. > > * Operating on flonums in Typed Racket, using generic numeric functions, is faster than doing the same in C. > > Overall, it looks like the TR code is within the same order of magnitude (pun not intended) as the C code. I would love to try this benchmark with either 1) a `magnitude*' with an `AnyValues' return type; or 2) a contract boundary that doesn't check TR's return types for first-order functions. > > (I managed to make a `magnitude*' with type Number -> AnyValues, but TR couldn't make a contract for it.) > > Neil ⊥ > > > _ > Racket Developers list: > http://lists.racket-lang.org/dev > _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] TR performance (was: Re: [plt] Push #27909: master branch updated)
On Wed, Dec 11, 2013 at 7:25 PM, John Clements wrote: > > Wow! I had no idea TR was that fast. In fairness, much of this is that Racket is that fast -- Matthew's put a lot of work into the JIT over the last few years. > Related question: how hard is it to reason about the GC behavior of TR code? > These numbers suggest to me that it might be possible to write TR code that > could be pretty much guaranteed not to collect, and therefore potentially > appropriate for use in audio callback functions, where the #1 rule is: NO GC > PAUSES. First, contracts allocate (less after Robby's changes, but still), so typed/untyped boundaries are bad for GC. Second, TR doesn't change GC or allocation behavior except if you're using floating point or complex numbers (where it reduces allocation). So you can reason about it similarly to how you'd reason about untyped Racket code. Sam _ Racket Developers list: http://lists.racket-lang.org/dev
[racket-dev] TR performance (was: Re: [plt] Push #27909: master branch updated)
On Dec 11, 2013, at 4:18 PM, Neil Toronto wrote: > On 12/11/2013 02:49 PM, Neil Toronto wrote: >> On 12/11/2013 01:55 PM, Stephen Bloch wrote: On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote: > numeric primitives implemented in Typed Racket are faster than the > same primitives implemented in C. >>> >>> Whoa! How did that happen? >> >> Whoa! That's not what I meant! O_o >> >> I said "we might be getting close" to that. I haven't tried porting a >> numeric C primitive to TR yet, but I have a hunch that it'll still be >> slower. I'll try one now and report what I find. >> >> Neil ⊥ > ... > For comparison, here are the timings for running the benchmarks in TR with > #:no-optimize: > > > Function Flonum Rational Fixnum Integer Float-Complex > --- > magnitude* 45 70* 37 102* 318 > magnitude 61 45 39 91* 394 > > * = unexpectedly high > > > Here's what I understand from comparing the numbers: > > * Except for non-fixnum integers, calling `magnitude' in TR is just as fast > as in untyped Racket. I have no idea why it would be slower on big integers. > That's just weird. > > * Calling `abs' in Racket is faster than calling `scheme_abs' in C, except on > rationals and big integers. > > * Operating on flonums in Typed Racket, using generic numeric functions, is > faster than doing the same in C. > > Overall, it looks like the TR code is within the same order of magnitude (pun > not intended) as the C code. I would love to try this benchmark with either > 1) a `magnitude*' with an `AnyValues' return type; or 2) a contract boundary > that doesn't check TR's return types for first-order functions. Wow! I had no idea TR was that fast. Related question: how hard is it to reason about the GC behavior of TR code? These numbers suggest to me that it might be possible to write TR code that could be pretty much guaranteed not to collect, and therefore potentially appropriate for use in audio callback functions, where the #1 rule is: NO GC PAUSES. John _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] [plt] Push #27909: master branch updated
On 12/11/2013 02:49 PM, Neil Toronto wrote: On 12/11/2013 01:55 PM, Stephen Bloch wrote: On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote: numeric primitives implemented in Typed Racket are faster than the same primitives implemented in C. Whoa! How did that happen? Whoa! That's not what I meant! O_o I said "we might be getting close" to that. I haven't tried porting a numeric C primitive to TR yet, but I have a hunch that it'll still be slower. I'll try one now and report what I find. Neil ⊥ I can't figure out why `flsinh' is faster to call from untyped Racket than `sinh'. All my tests with a Typed Racket `magnitude' show calls from untyped code are significantly slower, except in the one case that it computes Euclidean distance. That case is only twice as slow. I've attached the benchmark program. The `magnitude*' function is more or less a direct translation of `magnitude' from "number.c" into Typed Racket. Here's a summary of the results I get on my computer, in milliseconds, for 5 million calls from untyped Racket, by data type. Function Flonum Rational Fixnum Integer Float-Complex --- magnitude* 385 419 378 414 686 magnitude 59 44 40 40 390 The only one that's close in relative terms is Float-Complex. The others just call `abs'. The decompiled code doesn't show any inlining of `magnitude', so this comparison should be good. I'll bet checking the return value contract (which is unnecessary) is the main slowdown. It has to check for number of values. For comparison, here are the timings for running the benchmarks in TR with #:no-optimize: Function Flonum Rational Fixnum Integer Float-Complex --- magnitude* 45 70* 37 102* 318 magnitude 61 45 39 91* 394 * = unexpectedly high Here's what I understand from comparing the numbers: * Except for non-fixnum integers, calling `magnitude' in TR is just as fast as in untyped Racket. I have no idea why it would be slower on big integers. That's just weird. * Calling `abs' in Racket is faster than calling `scheme_abs' in C, except on rationals and big integers. * Operating on flonums in Typed Racket, using generic numeric functions, is faster than doing the same in C. Overall, it looks like the TR code is within the same order of magnitude (pun not intended) as the C code. I would love to try this benchmark with either 1) a `magnitude*' with an `AnyValues' return type; or 2) a contract boundary that doesn't check TR's return types for first-order functions. (I managed to make a `magnitude*' with type Number -> AnyValues, but TR couldn't make a contract for it.) Neil ⊥ #lang racket (module typed-defs typed/racket (require math/base) (provide magnitude*) (: magnitude* (Number -> Any)) (define (magnitude* z) (cond [(real? z) (abs z)] [else (define r (abs (real-part z))) (define i (abs (imag-part z))) (cond [(eq? r 0) i] [else (let-values ([(r i) (if (i . < . r) (values i r) (values r i))]) (cond [(zero? r) (exact->inexact i)] [(= i +inf.0) (if (eqv? r +nan.0) +nan.0 +inf.0)] [else (define q (/ r i)) (* i (sqrt (+ 1 (* q q]))])])) ) ;(module test typed/racket #:no-optimize (module test racket (require math/base typed/racket/base (submod ".." typed-defs)) (define x (random)) (define y (/ (random 1) (+ 1 (random 1 (define i (random-integer (- (expt 2 20)) (expt 2 20))) (define n (let: loop : Integer () (define n (random-integer (- (expt 2 128)) (expt 2 128))) (if (fixnum? n) (loop) n))) (define z (make-rectangular (random) (random))) (define-syntax-rule (test-one-arg-fun f x) (begin (printf "(~a ~a)~n" 'f 'x) (for ([_ (in-range 5)]) (time (for ([_ (in-range 500)]) (f x (newline))) (test-one-arg-fun magnitude* x) (test-one-arg-fun magnitude x) (test-one-arg-fun magnitude* y) (test-one-arg-fun magnitude y) (test-one-arg-fun magnitude* i) (test-one-arg-fun magnitude i) (test-one-arg-fun magnitude* n) (test-one-arg-fun magnitude n) (test-one-arg-fun magnitude* z) (test-one-arg-fun magnitude z) ) (require 'test) _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] [plt] Push #27909: master branch updated
On 12/11/2013 01:55 PM, Stephen Bloch wrote: On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote: numeric primitives implemented in Typed Racket are faster than the same primitives implemented in C. Whoa! How did that happen? Whoa! That's not what I meant! O_o I said "we might be getting close" to that. I haven't tried porting a numeric C primitive to TR yet, but I have a hunch that it'll still be slower. I'll try one now and report what I find. Neil ⊥ _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] [plt] Push #27909: master branch updated
> On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote: > >> numeric primitives implemented in Typed Racket are faster than the same >> primitives implemented in C. Whoa! How did that happen? Stephen Bloch sbl...@adelphi.edu GPG key at http://home.adelphi.edu/sbloch/sbloch.pubkey.asc signature.asc Description: Message signed with OpenPGP using GPGMail _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] [plt] Push #27909: master branch updated
On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote: > numeric primitives implemented in Typed Racket are faster than the same > primitives implemented in C. Halleluja! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] [plt] Push #27909: master branch updated
On 12/11/2013 11:07 AM, ro...@racket-lang.org wrote: robby has updated `master' from 542e256206 to c321f6dd0c. http://git.racket-lang.org/plt/542e256206..c321f6dd0c =[ One Commit ]= Directory summary: 37.6% pkgs/racket-pkgs/racket-test/tests/racket/contract/ 5.5% pkgs/ 46.3% racket/collects/racket/contract/private/ 10.0% racket/collects/racket/private/ ~~ c321f6d Robby Findler 2013-12-04 22:35 : | Change contract system so that projections are more first-order friendly Awesome. I've attached some more benchmarks, for `flrational?', `flsinh', `fllog1p', `lg+', and `flgamma'. These functions are pretty representative, and have a range of complexity from trivial to complicated. (For example, `flrational?' is implemented using two flops, and `flgamma' usually does ~50 flops in the range I tested.) Approximate average times in milliseconds for 1 million calls: Function TR Untyped pre-push Untyped post-push -- flrational? 5 322 98 flsinh 55 343 121 fllog1p 47 351 117 lg+ 61 384 154 flgamma 165 521 262 There's also less variance in the timings, probably because there are fewer minor GC pauses during the tests. Not shown on the table: untyped `sinh' calls take 140ms in the same test, so it's now faster to use `flsinh' from `math/flonum' in untyped code, if operating on flonums. Cool. We might be getting close to where numeric primitives implemented in Typed Racket are faster than the same primitives implemented in C. The `flrational?' test is still amazing in TR. The function's two flops get inlined (I checked the decompiled module), which I suppose allows more JIT-level optimizations. The only things I can think of to account for the extra time over TR's now are range/domain checking and boxing flonum return values. I think I remember hearing something from someone (maybe Eric?) at RacketCon about inlining contract checks. Is that in the works? Neil ⊥ #lang racket (require math/flonum math/special-functions racket/unsafe/ops (only-in typed/racket/base :)) (define x (random)) (: bx Boolean) (define bx #f) (define vec (make-flvector 1)) (define n 100) (printf "flrational?~n") (for ([_ (in-range 5)]) (time (for ([_ (in-range n)]) (set! bx (flrational? x) (newline) (printf "flsinh~n") (for ([_ (in-range 5)]) (time (for ([_ (in-range n)]) (unsafe-flvector-set! vec 0 (flsinh x) (newline) (printf "fllog1p~n") (for ([_ (in-range 5)]) (time (for ([_ (in-range n)]) (unsafe-flvector-set! vec 0 (fllog1p x) (newline) (printf "lg+~n") (for ([_ (in-range 5)]) (time (for ([_ (in-range n)]) (unsafe-flvector-set! vec 0 (lg+ x x) (newline) (printf "flgamma~n") (for ([_ (in-range 5)]) (time (for ([_ (in-range n)]) (unsafe-flvector-set! vec 0 (flgamma x) (newline) flrational? cpu time: 4 real time: 5 gc time: 0 cpu time: 4 real time: 6 gc time: 0 cpu time: 8 real time: 5 gc time: 0 cpu time: 4 real time: 6 gc time: 0 cpu time: 8 real time: 5 gc time: 0 flsinh cpu time: 52 real time: 55 gc time: 4 cpu time: 56 real time: 54 gc time: 0 cpu time: 52 real time: 54 gc time: 0 cpu time: 56 real time: 54 gc time: 4 cpu time: 56 real time: 56 gc time: 0 fllog1p cpu time: 48 real time: 46 gc time: 0 cpu time: 44 real time: 48 gc time: 0 cpu time: 48 real time: 47 gc time: 4 cpu time: 48 real time: 47 gc time: 0 cpu time: 48 real time: 47 gc time: 0 lg+ cpu time: 60 real time: 61 gc time: 0 cpu time: 60 real time: 61 gc time: 0 cpu time: 60 real time: 61 gc time: 4 cpu time: 64 real time: 61 gc time: 0 cpu time: 60 real time: 63 gc time: 0 flgamma cpu time: 168 real time: 167 gc time: 4 cpu time: 164 real time: 165 gc time: 0 cpu time: 164 real time: 164 gc time: 0 cpu time: 164 real time: 164 gc time: 4 cpu time: 168 real time: 165 gc time: 0 flrational? cpu time: 316 real time: 315 gc time: 0 cpu time: 316 real time: 314 gc time: 0 cpu time: 328 real time: 328 gc time: 4 cpu time: 324 real time: 326 gc time: 0 cpu time: 328 real time: 327 gc time: 0 flsinh cpu time: 348 real time: 350 gc time: 12 cpu time: 336 real time: 336 gc time: 0 cpu time: 340 real time: 338 gc time: 0 cpu time: 348 real time: 349 gc time: 16 cpu time: 344 real time: 343 gc time: 4 fllog1p cpu time: 348 real time: 347 gc time: 4 cpu time: 352 real time: 354 gc time: 4 cpu time: 348 real time: 349 gc time: 8 cpu time: 348 real time: 347 gc time: 4 cpu time: 360 real time: 359 gc time: 4 lg+ cpu time: 376 real time: 379 gc time: 0 cpu time: 384 real time: 384 gc time: 8 cpu time: 388 real time: 387 gc time: 4 cpu time: 380 real time: 381 gc time: 0 cpu time: