Re: [racket-dev] [plt] Push #27909: master branch updated

2013-12-11 Thread Eric Dobson
Removing the return value checking is in the works. It actually is
removing all of the checks that would blame typed code, so higher
order functions/datastructure get improvements too. It is actually
functional the last time I checked, but lacking documentation which is
what is holding up merging with mainline.

https://github.com/plt/racket/pull/453

On Wed, Dec 11, 2013 at 7:57 PM, Robby Findler
 wrote:
> I see that TR's type->contract returns
>
>  (-> (flat-named-contract (quote Float) flonum?) (flat-named-contract (quote
> Float) flonum?))
>
> for the type (Float -> Float), but it could return
>
>  (-> (flat-named-contract (quote Float) flonum?) any)
>
> which wouldn't do any result value checking (this being different from any/c
> as the range of the arrow contract).
>
> Robby
>
>
> On Wed, Dec 11, 2013 at 6:18 PM, Neil Toronto 
> wrote:
>>
>> On 12/11/2013 02:49 PM, Neil Toronto wrote:
>>>
>>> On 12/11/2013 01:55 PM, Stephen Bloch wrote:
>
> On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:
>
>> numeric primitives implemented in Typed Racket are faster than the
>> same primitives implemented in C.


 Whoa!  How did that happen?
>>>
>>>
>>> Whoa! That's not what I meant! O_o
>>>
>>> I said "we might be getting close" to that. I haven't tried porting a
>>> numeric C primitive to TR yet, but I have a hunch that it'll still be
>>> slower. I'll try one now and report what I find.
>>>
>>> Neil ⊥
>>
>>
>> I can't figure out why `flsinh' is faster to call from untyped Racket than
>> `sinh'. All my tests with a Typed Racket `magnitude' show calls from untyped
>> code are significantly slower, except in the one case that it computes
>> Euclidean distance. That case is only twice as slow.
>>
>> I've attached the benchmark program. The `magnitude*' function is more or
>> less a direct translation of `magnitude' from "number.c" into Typed Racket.
>> Here's a summary of the results I get on my computer, in milliseconds, for 5
>> million calls from untyped Racket, by data type.
>>
>>
>> Function Flonum  Rational  Fixnum  Integer  Float-Complex
>> ---
>> magnitude* 385  419  378 414 686
>> magnitude   59   44   40  40 390
>>
>>
>> The only one that's close in relative terms is Float-Complex. The others
>> just call `abs'. The decompiled code doesn't show any inlining of
>> `magnitude', so this comparison should be good.
>>
>> I'll bet checking the return value contract (which is unnecessary) is the
>> main slowdown. It has to check for number of values.
>>
>> For comparison, here are the timings for running the benchmarks in TR with
>> #:no-optimize:
>>
>>
>> Function Flonum  Rational  Fixnum  Integer  Float-Complex
>> ---
>> magnitude*  45   70*  37 102*   318
>> magnitude   61   45   39  91*   394
>>
>>   * = unexpectedly high
>>
>>
>> Here's what I understand from comparing the numbers:
>>
>>  * Except for non-fixnum integers, calling `magnitude' in TR is just as
>> fast as in untyped Racket. I have no idea why it would be slower on big
>> integers. That's just weird.
>>
>>  * Calling `abs' in Racket is faster than calling `scheme_abs' in C,
>> except on rationals and big integers.
>>
>>  * Operating on flonums in Typed Racket, using generic numeric functions,
>> is faster than doing the same in C.
>>
>> Overall, it looks like the TR code is within the same order of magnitude
>> (pun not intended) as the C code. I would love to try this benchmark with
>> either 1) a `magnitude*' with an `AnyValues' return type; or 2) a contract
>> boundary that doesn't check TR's return types for first-order functions.
>>
>> (I managed to make a `magnitude*' with type Number -> AnyValues, but TR
>> couldn't make a contract for it.)
>>
>> Neil ⊥
>>
>>
>> _
>>   Racket Developers list:
>>   http://lists.racket-lang.org/dev
>>
>
> _
>   Racket Developers list:
>   http://lists.racket-lang.org/dev
>

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] [plt] Push #27909: master branch updated

2013-12-11 Thread Robby Findler
I see that TR's type->contract returns

 (-> (flat-named-contract (quote Float) flonum?) (flat-named-contract
(quote Float) flonum?))

for the type (Float -> Float), but it could return

 (-> (flat-named-contract (quote Float) flonum?) any)

which wouldn't do any result value checking (this being different from
any/c as the range of the arrow contract).

Robby

On Wed, Dec 11, 2013 at 6:18 PM, Neil Toronto 
wrote:
>
> On 12/11/2013 02:49 PM, Neil Toronto wrote:
>>
>> On 12/11/2013 01:55 PM, Stephen Bloch wrote:

 On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:

> numeric primitives implemented in Typed Racket are faster than the
> same primitives implemented in C.
>>>
>>>
>>> Whoa!  How did that happen?
>>
>>
>> Whoa! That's not what I meant! O_o
>>
>> I said "we might be getting close" to that. I haven't tried porting a
>> numeric C primitive to TR yet, but I have a hunch that it'll still be
>> slower. I'll try one now and report what I find.
>>
>> Neil ⊥
>
>
> I can't figure out why `flsinh' is faster to call from untyped Racket
than `sinh'. All my tests with a Typed Racket `magnitude' show calls from
untyped code are significantly slower, except in the one case that it
computes Euclidean distance. That case is only twice as slow.
>
> I've attached the benchmark program. The `magnitude*' function is more or
less a direct translation of `magnitude' from "number.c" into Typed Racket.
Here's a summary of the results I get on my computer, in milliseconds, for
5 million calls from untyped Racket, by data type.
>
>
> Function Flonum  Rational  Fixnum  Integer  Float-Complex
> ---
> magnitude* 385  419  378 414 686
> magnitude   59   44   40  40 390
>
>
> The only one that's close in relative terms is Float-Complex. The others
just call `abs'. The decompiled code doesn't show any inlining of
`magnitude', so this comparison should be good.
>
> I'll bet checking the return value contract (which is unnecessary) is the
main slowdown. It has to check for number of values.
>
> For comparison, here are the timings for running the benchmarks in TR
with #:no-optimize:
>
>
> Function Flonum  Rational  Fixnum  Integer  Float-Complex
> ---
> magnitude*  45   70*  37 102*   318
> magnitude   61   45   39  91*   394
>
>   * = unexpectedly high
>
>
> Here's what I understand from comparing the numbers:
>
>  * Except for non-fixnum integers, calling `magnitude' in TR is just as
fast as in untyped Racket. I have no idea why it would be slower on big
integers. That's just weird.
>
>  * Calling `abs' in Racket is faster than calling `scheme_abs' in C,
except on rationals and big integers.
>
>  * Operating on flonums in Typed Racket, using generic numeric functions,
is faster than doing the same in C.
>
> Overall, it looks like the TR code is within the same order of magnitude
(pun not intended) as the C code. I would love to try this benchmark with
either 1) a `magnitude*' with an `AnyValues' return type; or 2) a contract
boundary that doesn't check TR's return types for first-order functions.
>
> (I managed to make a `magnitude*' with type Number -> AnyValues, but TR
couldn't make a contract for it.)
>
> Neil ⊥
>
>
> _
>   Racket Developers list:
>   http://lists.racket-lang.org/dev
>
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] TR performance (was: Re: [plt] Push #27909: master branch updated)

2013-12-11 Thread Sam Tobin-Hochstadt
On Wed, Dec 11, 2013 at 7:25 PM, John Clements
 wrote:
>
> Wow! I had no idea TR was that fast.

In fairness, much of this is that Racket is that fast -- Matthew's put
a lot of work into the JIT over the last few years.

> Related question: how hard is it to reason about the GC behavior of TR code? 
> These numbers suggest to me that it might be possible to write TR code that 
> could be pretty much guaranteed not to collect, and therefore potentially 
> appropriate for use in audio callback functions, where the #1 rule is: NO GC 
> PAUSES.

First, contracts allocate (less after Robby's changes, but still), so
typed/untyped boundaries are bad for GC.

Second, TR doesn't change GC or allocation behavior except if you're
using floating point or complex numbers (where it reduces allocation).
 So you can reason about it similarly to how you'd reason about
untyped Racket code.

Sam

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


[racket-dev] TR performance (was: Re: [plt] Push #27909: master branch updated)

2013-12-11 Thread John Clements

On Dec 11, 2013, at 4:18 PM, Neil Toronto wrote:

> On 12/11/2013 02:49 PM, Neil Toronto wrote:
>> On 12/11/2013 01:55 PM, Stephen Bloch wrote:
 On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:
 
> numeric primitives implemented in Typed Racket are faster than the
> same primitives implemented in C.
>>> 
>>> Whoa!  How did that happen?
>> 
>> Whoa! That's not what I meant! O_o
>> 
>> I said "we might be getting close" to that. I haven't tried porting a
>> numeric C primitive to TR yet, but I have a hunch that it'll still be
>> slower. I'll try one now and report what I find.
>> 
>> Neil ⊥
> 

...

> For comparison, here are the timings for running the benchmarks in TR with 
> #:no-optimize:
> 
> 
> Function Flonum  Rational  Fixnum  Integer  Float-Complex
> ---
> magnitude*  45   70*  37 102*   318
> magnitude   61   45   39  91*   394
> 
>  * = unexpectedly high
> 
> 
> Here's what I understand from comparing the numbers:
> 
> * Except for non-fixnum integers, calling `magnitude' in TR is just as fast 
> as in untyped Racket. I have no idea why it would be slower on big integers. 
> That's just weird.
> 
> * Calling `abs' in Racket is faster than calling `scheme_abs' in C, except on 
> rationals and big integers.
> 
> * Operating on flonums in Typed Racket, using generic numeric functions, is 
> faster than doing the same in C.
> 
> Overall, it looks like the TR code is within the same order of magnitude (pun 
> not intended) as the C code. I would love to try this benchmark with either 
> 1) a `magnitude*' with an `AnyValues' return type; or 2) a contract boundary 
> that doesn't check TR's return types for first-order functions.

Wow! I had no idea TR was that fast.

Related question: how hard is it to reason about the GC behavior of TR code? 
These numbers suggest to me that it might be possible to write TR code that 
could be pretty much guaranteed not to collect, and therefore potentially 
appropriate for use in audio callback functions, where the #1 rule is: NO GC 
PAUSES.

John


_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] [plt] Push #27909: master branch updated

2013-12-11 Thread Neil Toronto

On 12/11/2013 02:49 PM, Neil Toronto wrote:

On 12/11/2013 01:55 PM, Stephen Bloch wrote:

On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:


numeric primitives implemented in Typed Racket are faster than the
same primitives implemented in C.


Whoa!  How did that happen?


Whoa! That's not what I meant! O_o

I said "we might be getting close" to that. I haven't tried porting a
numeric C primitive to TR yet, but I have a hunch that it'll still be
slower. I'll try one now and report what I find.

Neil ⊥


I can't figure out why `flsinh' is faster to call from untyped Racket 
than `sinh'. All my tests with a Typed Racket `magnitude' show calls 
from untyped code are significantly slower, except in the one case that 
it computes Euclidean distance. That case is only twice as slow.


I've attached the benchmark program. The `magnitude*' function is more 
or less a direct translation of `magnitude' from "number.c" into Typed 
Racket. Here's a summary of the results I get on my computer, in 
milliseconds, for 5 million calls from untyped Racket, by data type.



Function Flonum  Rational  Fixnum  Integer  Float-Complex
---
magnitude* 385  419  378 414 686
magnitude   59   44   40  40 390


The only one that's close in relative terms is Float-Complex. The others 
just call `abs'. The decompiled code doesn't show any inlining of 
`magnitude', so this comparison should be good.


I'll bet checking the return value contract (which is unnecessary) is 
the main slowdown. It has to check for number of values.


For comparison, here are the timings for running the benchmarks in TR 
with #:no-optimize:



Function Flonum  Rational  Fixnum  Integer  Float-Complex
---
magnitude*  45   70*  37 102*   318
magnitude   61   45   39  91*   394

  * = unexpectedly high


Here's what I understand from comparing the numbers:

 * Except for non-fixnum integers, calling `magnitude' in TR is just as 
fast as in untyped Racket. I have no idea why it would be slower on big 
integers. That's just weird.


 * Calling `abs' in Racket is faster than calling `scheme_abs' in C, 
except on rationals and big integers.


 * Operating on flonums in Typed Racket, using generic numeric 
functions, is faster than doing the same in C.


Overall, it looks like the TR code is within the same order of magnitude 
(pun not intended) as the C code. I would love to try this benchmark 
with either 1) a `magnitude*' with an `AnyValues' return type; or 2) a 
contract boundary that doesn't check TR's return types for first-order 
functions.


(I managed to make a `magnitude*' with type Number -> AnyValues, but TR 
couldn't make a contract for it.)


Neil ⊥

#lang racket

(module typed-defs typed/racket
  (require math/base)
  
  (provide magnitude*)
  
  (: magnitude* (Number -> Any))
  (define (magnitude* z)
(cond [(real? z)  (abs z)]
  [else
   (define r (abs (real-part z)))
   (define i (abs (imag-part z)))
   (cond [(eq? r 0)  i]
 [else
  (let-values ([(r i)  (if (i . < . r) (values i r) (values r 
i))])
(cond [(zero? r)  (exact->inexact i)]
  [(= i +inf.0)  (if (eqv? r +nan.0) +nan.0 +inf.0)]
  [else
   (define q (/ r i))
   (* i (sqrt (+ 1 (* q q]))])]))
  )

;(module test typed/racket #:no-optimize
(module test racket
  
  (require math/base
   typed/racket/base
   (submod ".." typed-defs))
  
  (define x (random))
  (define y (/ (random 1) (+ 1 (random 1
  (define i (random-integer (- (expt 2 20)) (expt 2 20)))
  (define n (let: loop : Integer ()
  (define n (random-integer (- (expt 2 128)) (expt 2 128)))
  (if (fixnum? n) (loop) n)))
  (define z (make-rectangular (random) (random)))
  
  (define-syntax-rule (test-one-arg-fun f x)
(begin
  (printf "(~a ~a)~n" 'f 'x)
  (for ([_  (in-range 5)])
(time (for ([_  (in-range 500)])
(f x
  (newline)))
  
  (test-one-arg-fun magnitude* x)
  (test-one-arg-fun magnitude x)
  (test-one-arg-fun magnitude* y)
  (test-one-arg-fun magnitude y)
  (test-one-arg-fun magnitude* i)
  (test-one-arg-fun magnitude i)
  (test-one-arg-fun magnitude* n)
  (test-one-arg-fun magnitude n)
  (test-one-arg-fun magnitude* z)
  (test-one-arg-fun magnitude z)
  )

(require 'test)
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] [plt] Push #27909: master branch updated

2013-12-11 Thread Neil Toronto

On 12/11/2013 01:55 PM, Stephen Bloch wrote:

On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:


numeric primitives implemented in Typed Racket are faster than the same 
primitives implemented in C.


Whoa!  How did that happen?


Whoa! That's not what I meant! O_o

I said "we might be getting close" to that. I haven't tried porting a 
numeric C primitive to TR yet, but I have a hunch that it'll still be 
slower. I'll try one now and report what I find.


Neil ⊥

_
 Racket Developers list:
 http://lists.racket-lang.org/dev


Re: [racket-dev] [plt] Push #27909: master branch updated

2013-12-11 Thread Stephen Bloch
> On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:
> 
>> numeric primitives implemented in Typed Racket are faster than the same 
>> primitives implemented in C.

Whoa!  How did that happen?



Stephen Bloch
sbl...@adelphi.edu
GPG key at http://home.adelphi.edu/sbloch/sbloch.pubkey.asc



signature.asc
Description: Message signed with OpenPGP using GPGMail
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] [plt] Push #27909: master branch updated

2013-12-11 Thread Matthias Felleisen

On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:

> numeric primitives implemented in Typed Racket are faster than the same 
> primitives implemented in C.


Halleluja! 
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] [plt] Push #27909: master branch updated

2013-12-11 Thread Neil Toronto

On 12/11/2013 11:07 AM, ro...@racket-lang.org wrote:

robby has updated `master' from 542e256206 to c321f6dd0c.
   http://git.racket-lang.org/plt/542e256206..c321f6dd0c

=[ One Commit ]=
Directory summary:
   37.6% pkgs/racket-pkgs/racket-test/tests/racket/contract/
5.5% pkgs/
   46.3% racket/collects/racket/contract/private/
   10.0% racket/collects/racket/private/

~~

c321f6d Robby Findler  2013-12-04 22:35
:
| Change contract system so that projections are more first-order friendly


Awesome. I've attached some more benchmarks, for `flrational?', 
`flsinh', `fllog1p', `lg+', and `flgamma'. These functions are pretty 
representative, and have a range of complexity from trivial to 
complicated. (For example, `flrational?' is implemented using two flops, 
and `flgamma' usually does ~50 flops in the range I tested.)


Approximate average times in milliseconds for 1 million calls:

Function TR Untyped pre-push Untyped post-push
--
flrational?   5 322   98
flsinh   55 343  121
fllog1p  47 351  117
lg+  61 384  154
flgamma 165 521  262

There's also less variance in the timings, probably because there are 
fewer minor GC pauses during the tests.


Not shown on the table: untyped `sinh' calls take 140ms in the same 
test, so it's now faster to use `flsinh' from `math/flonum' in untyped 
code, if operating on flonums. Cool. We might be getting close to where 
numeric primitives implemented in Typed Racket are faster than the same 
primitives implemented in C.


The `flrational?' test is still amazing in TR. The function's two flops 
get inlined (I checked the decompiled module), which I suppose allows 
more JIT-level optimizations.


The only things I can think of to account for the extra time over TR's 
now are range/domain checking and boxing flonum return values. I think I 
remember hearing something from someone (maybe Eric?) at RacketCon about 
inlining contract checks. Is that in the works?


Neil ⊥

#lang racket

(require math/flonum
 math/special-functions
 racket/unsafe/ops
 (only-in typed/racket/base :))

(define x (random))

(: bx Boolean)
(define bx #f)

(define vec (make-flvector 1))

(define n 100)

(printf "flrational?~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
  (set! bx (flrational? x)
(newline)

(printf "flsinh~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
  (unsafe-flvector-set! vec 0 (flsinh x)
(newline)

(printf "fllog1p~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
  (unsafe-flvector-set! vec 0 (fllog1p x)
(newline)

(printf "lg+~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
  (unsafe-flvector-set! vec 0 (lg+ x x)
(newline)

(printf "flgamma~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
  (unsafe-flvector-set! vec 0 (flgamma x)
(newline)

flrational?
cpu time: 4 real time: 5 gc time: 0
cpu time: 4 real time: 6 gc time: 0
cpu time: 8 real time: 5 gc time: 0
cpu time: 4 real time: 6 gc time: 0
cpu time: 8 real time: 5 gc time: 0

flsinh
cpu time: 52 real time: 55 gc time: 4
cpu time: 56 real time: 54 gc time: 0
cpu time: 52 real time: 54 gc time: 0
cpu time: 56 real time: 54 gc time: 4
cpu time: 56 real time: 56 gc time: 0

fllog1p
cpu time: 48 real time: 46 gc time: 0
cpu time: 44 real time: 48 gc time: 0
cpu time: 48 real time: 47 gc time: 4
cpu time: 48 real time: 47 gc time: 0
cpu time: 48 real time: 47 gc time: 0

lg+
cpu time: 60 real time: 61 gc time: 0
cpu time: 60 real time: 61 gc time: 0
cpu time: 60 real time: 61 gc time: 4
cpu time: 64 real time: 61 gc time: 0
cpu time: 60 real time: 63 gc time: 0

flgamma
cpu time: 168 real time: 167 gc time: 4
cpu time: 164 real time: 165 gc time: 0
cpu time: 164 real time: 164 gc time: 0
cpu time: 164 real time: 164 gc time: 4
cpu time: 168 real time: 165 gc time: 0

flrational?
cpu time: 316 real time: 315 gc time: 0
cpu time: 316 real time: 314 gc time: 0
cpu time: 328 real time: 328 gc time: 4
cpu time: 324 real time: 326 gc time: 0
cpu time: 328 real time: 327 gc time: 0

flsinh
cpu time: 348 real time: 350 gc time: 12
cpu time: 336 real time: 336 gc time: 0
cpu time: 340 real time: 338 gc time: 0
cpu time: 348 real time: 349 gc time: 16
cpu time: 344 real time: 343 gc time: 4

fllog1p
cpu time: 348 real time: 347 gc time: 4
cpu time: 352 real time: 354 gc time: 4
cpu time: 348 real time: 349 gc time: 8
cpu time: 348 real time: 347 gc time: 4
cpu time: 360 real time: 359 gc time: 4

lg+
cpu time: 376 real time: 379 gc time: 0
cpu time: 384 real time: 384 gc time: 8
cpu time: 388 real time: 387 gc time: 4
cpu time: 380 real time: 381 gc time: 0
cpu time: