Re: Help speed up an inner loop?

2010-08-31 Thread Robert McIntyre
Ah, I see that I was mistaken about the timing. Sorry about that.

After a lot of fiddling around, I cam up with this faster form:

(defn countnl-lite
  [#^bytes buf]
  (areduce buf idx count (int 0)
   (if (= (clojure.lang.RT/aget buf idx) 10)
 (unchecked-add count 1)
 count)))

Key points are initializing count to a primitive integer and directly
calling clojure's aget to avoid an unnecessary integer cast.

On my system:

The unmodified countnl function takes ~ 180 msecs

Without AOT compilation countnl-lite takes around 66 msecs

With AOT compilation countnl-lite takes ~46 msecs

The java method takes ~19 msecs.

I've lost a factor of 2.25 somewhere and it makes me sad that I can't find it.
I would be very interested if anyone could improve countnl-lite.

--Robert McIntyre



On Mon, Aug 30, 2010 at 8:41 PM, Alan a...@malloys.org wrote:
 I think this misses the point. Of course java, c, and clojure will all
 have roughly the same wall-clock time for this program, since it is
 dominated by the I/O. You can even see that in the output from $ time
 java Iterate: less than 0.5s was spent in user space, the rest was
 spent in system code - that is, mostly doing I/O.

 The java version is a second faster as counted by the wall clock, and
 this is unlikely to be a coincidence: tsuraan's timing data suggests
 that the clojure program takes 80ms longer in each loop, and loops 10
 times. That comes out to 0.8 seconds, which is quite close to the
 differential you observed when timing from the command line.

 On Aug 30, 1:38 pm, Robert McIntyre r...@mit.edu wrote:
 I don't know what the heck is going here, but ignore the time the
 program is reporting and just
 pay attention to how long it actually takes wall-clock style and
 you'll see that your clojure and
 java programs already take the same time.

 Here are my findings:

 I saved Iterate.java into my rlm package and ran:
 time java -server rlm.Iterate

 results:
 time java -server rlm.Iterate
 Wanted 16777216 got 16777216 bytes
 counted 65341 nls in 27 msec
 Wanted 16777216 got 16777216 bytes
 counted 65310 nls in 27 msec
 Wanted 16777216 got 16777216 bytes
 counted 66026 nls in 21 msec
 Wanted 16777216 got 16777216 bytes
 counted 65473 nls in 19 msec
 Wanted 16777216 got 16777216 bytes
 counted 65679 nls in 19 msec
 Wanted 16777216 got 16777216 bytes
 counted 65739 nls in 19 msec
 Wanted 16777216 got 16777216 bytes
 counted 65310 nls in 21 msec
 Wanted 16777216 got 16777216 bytes
 counted 65810 nls in 18 msec
 Wanted 16777216 got 16777216 bytes
 counted 65531 nls in 21 msec
 Wanted 16777216 got 16777216 bytes
 counted 65418 nls in 21 msec

 real    0m27.469s
 user    0m0.472s
 sys     0m26.638s

 I wrapped the last bunch of commands in your clojure script into a
 (run) function:
 (defn run []
   (let [ifs (FileInputStream. /dev/urandom)
         buf (make-array Byte/TYPE *numbytes*)]
     (dotimes [_ 10]
       (let [sz (.read ifs buf)]
         (println Wanted *numbytes* got sz bytes)
         (let [count (time (countnl buf))]
           (println Got count nls))

 and ran
 (time (run)) at the repl:

 (time (run))
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.081975 msecs
 Got 65894 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.001814 msecs
 Got 65949 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.061934 msecs
 Got 65603 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.031131 msecs
 Got 65563 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.122567 msecs
 Got 65696 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 182.968066 msecs
 Got 65546 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.058508 msecs
 Got 65468 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 182.932395 msecs
 Got 65872 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.074646 msecs
 Got 65498 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 187.733636 msecs
 Got 65434 nls
 Elapsed time: 28510.331507 msecs
 nil

 Total running time for both programs is around 28 seconds.
 The java program seems to be incorrectly reporting it's time.

 --Robert McIntyre

 On Mon, Aug 30, 2010 at 4:03 PM, tsuraan tsur...@gmail.com wrote:
  Just to try to see if clojure is a practical language for doing
  byte-level work (parsing files, network streams, etc), I wrote a
  trivial function to iterate through a buffer of bytes and count all
  the newlines that it sees.  For my testing, I've written a C version,
  a Java version, and a Clojure version.  I'm running each routine 10
  times over a 16MB buffer read from /dev/urandom (the buffer is
  refreshed between each call to the newline counting function).  With
  gcc -O0, I get about 80ms per 16MB buffer.  With gcc -O3, I get ~14ms
  per buffer.  With javac (and java -server) I get 20ms per 16MB buffer.
   With clojure, I get 105ms per buffer (after the jvm warms up).  I'm
  guessing that the huge boost that java and gcc -O3 get is from
  

Re: Help speed up an inner loop?

2010-08-31 Thread John Fingerhut
Consider trying to use == in place of where you have =, which can be
faster when comparing numbers for equality.  Source for this and a few other
performance tips:

http://gnuvince.wordpress.com/2009/05/11/clojure-performance-tips/

Andy

On Mon, Aug 30, 2010 at 11:46 PM, Robert McIntyre r...@mit.edu wrote:

 Ah, I see that I was mistaken about the timing. Sorry about that.

 After a lot of fiddling around, I cam up with this faster form:

 (defn countnl-lite
  [#^bytes buf]
  (areduce buf idx count (int 0)
   (if (= (clojure.lang.RT/aget buf idx) 10)
 (unchecked-add count 1)
 count)))

 Key points are initializing count to a primitive integer and directly
 calling clojure's aget to avoid an unnecessary integer cast.

 On my system:

 The unmodified countnl function takes ~ 180 msecs

 Without AOT compilation countnl-lite takes around 66 msecs

 With AOT compilation countnl-lite takes ~46 msecs

 The java method takes ~19 msecs.

 I've lost a factor of 2.25 somewhere and it makes me sad that I can't find
 it.
 I would be very interested if anyone could improve countnl-lite.

 --Robert McIntyre



 On Mon, Aug 30, 2010 at 8:41 PM, Alan a...@malloys.org wrote:
  I think this misses the point. Of course java, c, and clojure will all
  have roughly the same wall-clock time for this program, since it is
  dominated by the I/O. You can even see that in the output from $ time
  java Iterate: less than 0.5s was spent in user space, the rest was
  spent in system code - that is, mostly doing I/O.
 
  The java version is a second faster as counted by the wall clock, and
  this is unlikely to be a coincidence: tsuraan's timing data suggests
  that the clojure program takes 80ms longer in each loop, and loops 10
  times. That comes out to 0.8 seconds, which is quite close to the
  differential you observed when timing from the command line.
 
  On Aug 30, 1:38 pm, Robert McIntyre r...@mit.edu wrote:
  I don't know what the heck is going here, but ignore the time the
  program is reporting and just
  pay attention to how long it actually takes wall-clock style and
  you'll see that your clojure and
  java programs already take the same time.
 
  Here are my findings:
 
  I saved Iterate.java into my rlm package and ran:
  time java -server rlm.Iterate
 
  results:
  time java -server rlm.Iterate
  Wanted 16777216 got 16777216 bytes
  counted 65341 nls in 27 msec
  Wanted 16777216 got 16777216 bytes
  counted 65310 nls in 27 msec
  Wanted 16777216 got 16777216 bytes
  counted 66026 nls in 21 msec
  Wanted 16777216 got 16777216 bytes
  counted 65473 nls in 19 msec
  Wanted 16777216 got 16777216 bytes
  counted 65679 nls in 19 msec
  Wanted 16777216 got 16777216 bytes
  counted 65739 nls in 19 msec
  Wanted 16777216 got 16777216 bytes
  counted 65310 nls in 21 msec
  Wanted 16777216 got 16777216 bytes
  counted 65810 nls in 18 msec
  Wanted 16777216 got 16777216 bytes
  counted 65531 nls in 21 msec
  Wanted 16777216 got 16777216 bytes
  counted 65418 nls in 21 msec
 
  real0m27.469s
  user0m0.472s
  sys 0m26.638s
 
  I wrapped the last bunch of commands in your clojure script into a
  (run) function:
  (defn run []
(let [ifs (FileInputStream. /dev/urandom)
  buf (make-array Byte/TYPE *numbytes*)]
  (dotimes [_ 10]
(let [sz (.read ifs buf)]
  (println Wanted *numbytes* got sz bytes)
  (let [count (time (countnl buf))]
(println Got count nls))
 
  and ran
  (time (run)) at the repl:
 
  (time (run))
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 183.081975 msecs
  Got 65894 nls
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 183.001814 msecs
  Got 65949 nls
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 183.061934 msecs
  Got 65603 nls
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 183.031131 msecs
  Got 65563 nls
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 183.122567 msecs
  Got 65696 nls
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 182.968066 msecs
  Got 65546 nls
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 183.058508 msecs
  Got 65468 nls
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 182.932395 msecs
  Got 65872 nls
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 183.074646 msecs
  Got 65498 nls
  Wanted 16777216 got 16777216 bytes
  Elapsed time: 187.733636 msecs
  Got 65434 nls
  Elapsed time: 28510.331507 msecs
  nil
 
  Total running time for both programs is around 28 seconds.
  The java program seems to be incorrectly reporting it's time.
 
  --Robert McIntyre
 
  On Mon, Aug 30, 2010 at 4:03 PM, tsuraan tsur...@gmail.com wrote:
   Just to try to see if clojure is a practical language for doing
   byte-level work (parsing files, network streams, etc), I wrote a
   trivial function to iterate through a buffer of bytes and count all
   the newlines that it sees.  For my testing, I've written a C version,
   a Java version, and a Clojure version.  I'm running 

Re: Help speed up an inner loop?

2010-08-31 Thread Meikel Brandmeyer
Hi,

On 31 Aug., 08:46, Robert McIntyre r...@mit.edu wrote:

 Without AOT compilation countnl-lite takes around 66 msecs
 With AOT compilation countnl-lite takes ~46 msecs

Did you measure start-up time in your runs? AOT compilation should
have no impact on the runtime speed.

Sincerely
Meikel

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Help speed up an inner loop?

2010-08-31 Thread Nicolas Oury
I am not convince that (make-array Byte/TYPE *numbytes*) creates an
array of primitives.
And I think byte [] is an array of primitives.

That would make a difference.

I don't know if clojure has a byte-array. It seems that there is no
byte-array as there is int-array.

Could you try your code with int-array, or could you just write two
static methods in a Java class to read and write an int from a byte
array?

Best,

Nicolas.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Help speed up an inner loop?

2010-08-31 Thread Nicolas Oury
On Tue, Aug 31, 2010 at 1:53 PM, Nicolas Oury nicolas.o...@gmail.com wrote:
 I am not convince that (make-array Byte/TYPE *numbytes*) creates an
 array of primitives.
Actually it does, sorry for the noise.

Should check before sending emails.

Best,

Nicolas.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Help speed up an inner loop?

2010-08-31 Thread tsuraan
 (defn countnl-lite
  [#^bytes buf]
  (areduce buf idx count (int 0)
           (if (= (clojure.lang.RT/aget buf idx) 10)
             (unchecked-add count 1)
             count)))

 Key points are initializing count to a primitive integer and directly
 calling clojure's aget to avoid an unnecessary integer cast.

Are you using clojure 1.2?  If I try to set count to be (int 0) rather
than 0, I get this error:

Exception in thread main java.lang.RuntimeException:
java.lang.IllegalArgumentException: recur arg for primitive local:
count must be matching primitive

Even if I replace inc with the unchecked-add, or cast the result
of the inc to be int (replace (inc count) with (int (inc count)) )
it gives me that error.  Sort of strange...

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Help speed up an inner loop?

2010-08-31 Thread Nicolas Oury
Replace also  (unchecked-add count 1) with  (unchecked-add count (int 1))

(this should get easier in 1.3)

On Tue, Aug 31, 2010 at 4:20 PM, tsuraan tsur...@gmail.com wrote:
 (defn countnl-lite
  [#^bytes buf]
  (areduce buf idx count (int 0)
           (if (= (clojure.lang.RT/aget buf idx) 10)
             (unchecked-add count 1)
             count)))

 Key points are initializing count to a primitive integer and directly
 calling clojure's aget to avoid an unnecessary integer cast.

 Are you using clojure 1.2?  If I try to set count to be (int 0) rather
 than 0, I get this error:

 Exception in thread main java.lang.RuntimeException:
 java.lang.IllegalArgumentException: recur arg for primitive local:
 count must be matching primitive

 Even if I replace inc with the unchecked-add, or cast the result
 of the inc to be int (replace (inc count) with (int (inc count)) )
 it gives me that error.  Sort of strange...

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with your 
 first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Help speed up an inner loop?

2010-08-31 Thread tsuraan
 Replace also  (unchecked-add count 1) with  (unchecked-add count (int 1))

 (this should get easier in 1.3)

That didn't change anything for my tests, but this code:

(defn countnl
  [#^bytes buf]
  (areduce buf idx count (int 0)
   (if (= (aget buf idx) 10)
 (unchecked-add count 1)
 count)))

does 34ms runs (the .java code does 20ms runs on the same machine).
The biggest things that make a difference are replacing (inc count)
with (unchecked-add count 1), initializing count to (int 0) rather
than 0, and getting rid of the (let [nl (byte 0)]...).  I was a bit
disappointed that the let statement was so expensive, since I really
prefer seeing a variable to a magic number.  Has somebody written a
macro that can be used to rewrite code replacing variable references
with their constant values?  It would be of limited use (constants
only), but it would make the code prettier :)  For posterity, here are
some variations on countnl and their timings on my machine:

all runs are invoked with this command: java -server -cp clojure.jar
clojure.main iterate.clj; clojure.jar is from the clojure-1.2.zip
distribution


(defn countnl
  [#^bytes buf]
  (let [nl (byte 10)]
(areduce buf idx count 0
 (if (= (aget buf idx) nl)
   (inc count)
   count

Wanted 16777216 got 16777216 bytes
Elapsed time: 184.172834 msecs
Got 65875 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 193.546018 msecs
Got 64995 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 142.546453 msecs
Got 65677 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 135.843933 msecs
Got 66117 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 134.882156 msecs
Got 65449 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 134.864881 msecs
Got 65393 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 134.854118 msecs
Got 65065 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 134.82848 msecs
Got 65346 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 134.803702 msecs
Got 65783 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 134.832489 msecs
Got 65566 nls

(defn countnl
  [#^bytes buf]
  (let [nl (byte 10)]
(areduce buf idx count (int 0)
 (if (= (aget buf idx) nl)
   (inc count)
   count

Wanted 16777216 got 16777216 bytes
Elapsed time: 238.409697 msecs
Got 65698 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 208.872818 msecs
Got 65381 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 88.786986 msecs
Got 65358 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 88.76816 msecs
Got 65686 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 88.899431 msecs
Got 65683 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 88.783275 msecs
Got 65491 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 88.735416 msecs
Got 65749 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 88.762987 msecs
Got 65837 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 88.749591 msecs
Got 65759 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 88.72971 msecs
Got 65366 nls

(defn countnl
  [#^bytes buf]
  (areduce buf idx count (int 0)
   (if (= (aget buf idx) 10)
 (inc count)
 count)))

Wanted 16777216 got 16777216 bytes
Elapsed time: 185.465613 msecs
Got 65344 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 186.856418 msecs
Got 65529 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 73.778183 msecs
Got 65662 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 73.675509 msecs
Got 65034 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 73.674777 msecs
Got 65459 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 73.630553 msecs
Got 65172 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 73.668685 msecs
Got 64939 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 73.74801 msecs
Got 65462 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 73.699187 msecs
Got 65602 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 73.611555 msecs
Got 64937 nls

(defn countnl
  [#^bytes buf]
  (areduce buf idx count (int 0)
   (if (= (aget buf idx) 10)
 (unchecked-add count 1)
 count)))

Wanted 16777216 got 16777216 bytes
Elapsed time: 82.865761 msecs
Got 65266 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 34.228646 msecs
Got 65522 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 33.989741 msecs
Got 65537 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 33.947045 msecs
Got 65688 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 33.941481 msecs
Got 65395 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 33.952149 msecs
Got 65822 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 33.972258 msecs
Got 65495 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 33.903989 msecs
Got 65244 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 33.927276 msecs
Got 65331 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 33.917789 msecs
Got 65470 nls

(defn 

Re: Help speed up an inner loop?

2010-08-31 Thread Nicolas Oury
This one is quite good for me.
(defn countnl
 [#^bytes buf]
 (let [nl (int 10)]
   (areduce buf idx count (int 0)
(if (== (int (aget buf idx)) nl)
  (unchecked-inc count)
  count


It appears that == is not resolved for bytes. So converting to int works fine.

In this situation, inlining (int 10) does not buy much.

This one, though, is even faster and should be not far from the java
one, if you want to try it.

(defn countnl
 [#^bytes buf]
 (let [nl (int 10)
n (int (alength buf))]
   (loop [idx (int n) count (int 0)]
 (if (zero? idx)
 count
 (let [idx (unchecked-dec idx)]
   (if (== (int (aget buf idx)) nl)
 (recur  idx (unchecked-inc count))
 (recur  idx count)))

It would be interesting to know why it is nearly twice as fast as the
areduce version on my computer.

Best,

Nicolas.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Help speed up an inner loop?

2010-08-31 Thread tsuraan
 This one is quite good for me.
 (defn countnl
  [#^bytes buf]
  (let [nl (int 10)]
   (areduce buf idx count (int 0)
            (if (== (int (aget buf idx)) nl)
              (unchecked-inc count)
              count


 It appears that == is not resolved for bytes. So converting to int works fine.

aha, converting to int works for me as well.  With bytes it triggers
the reflection warning and takes minutes to run.  with the int
conversion, it runs in an average of 28ms (just a little over java).
Here's something really strange: I de-inlined the newline int, so my
function looks like this:

(defn countnl
  [#^bytes buf]
  (let [nl (int 10)]
(areduce buf idx count (int 0)
 (if (== (int (aget buf idx)) nl)
   (unchecked-add count (int 1))
   count

On my machine, this is running in 19.6ms (after the warmup it's really
constant).  That's the exact speed as the java code, which is really
cool.  it actually works the same way if you just replace the inlined
10 with (int 10), and it makes it just as fast as the explicit
looping code below.  Somehow on my earlier tests the let binding was
really slowing things down, but in this latest function it seems to
have no effect.  weird...

 In this situation, inlining (int 10) does not buy much.

interesting; for me replacing the 10 with (int 10) brings my times
from 28.7ms to 19.6ms.

 This one, though, is even faster and should be not far from the java
 one, if you want to try it.

 (defn countnl
  [#^bytes buf]
  (let [nl (int 10)
        n (int (alength buf))]
   (loop [idx (int n) count (int 0)]
         (if (zero? idx)
             count
             (let [idx (unchecked-dec idx)]
               (if (== (int (aget buf idx)) nl)
                 (recur  idx (unchecked-inc count))
                 (recur  idx count)))

 It would be interesting to know why it is nearly twice as fast as the
 areduce version on my computer.

That runs in 21.3ms for me, which actually makes it a tiny bit slower
than the code above (and slightly slower than the java code).

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Help speed up an inner loop?

2010-08-31 Thread Nicolas Oury
On Tue, Aug 31, 2010 at 8:43 PM, tsuraan tsur...@gmail.com wrote:

 In this situation, inlining (int 10) does not buy much.

 interesting; for me replacing the 10 with (int 10) brings my times
 from 28.7ms to 19.6ms.

I meant putting (int 10) instead of nl and a let.

Anyway, it seems that we can get to java speed, but with some hard work.

I have read somewhere than 1.3 will help a lot to achieve this
performance with less work.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Help speed up an inner loop?

2010-08-30 Thread tsuraan
Just to try to see if clojure is a practical language for doing
byte-level work (parsing files, network streams, etc), I wrote a
trivial function to iterate through a buffer of bytes and count all
the newlines that it sees.  For my testing, I've written a C version,
a Java version, and a Clojure version.  I'm running each routine 10
times over a 16MB buffer read from /dev/urandom (the buffer is
refreshed between each call to the newline counting function).  With
gcc -O0, I get about 80ms per 16MB buffer.  With gcc -O3, I get ~14ms
per buffer.  With javac (and java -server) I get 20ms per 16MB buffer.
 With clojure, I get 105ms per buffer (after the jvm warms up).  I'm
guessing that the huge boost that java and gcc -O3 get is from
converting per-byte operations to per-int ops; at least that ~4x boost
looks like it would come from something like that.  Is that an
optimization that is unavailable to clojure?  The java_interop doc
makes it sound like java and clojure get the exact same bytecode when
using areduce correctly, so maybe there's something I could be doing
better.  Here are my small programs; if somebody could suggest
improvements, I'd appreciate them.

iterate.clj:

(set! *warn-on-reflection* true)
(import java.io.FileInputStream)

(def *numbytes* (* 16 1024 1024))

(defn countnl
  [#^bytes buf]
  (let [nl (byte 10)]
(areduce buf idx count 0
 (if (= (aget buf idx) nl)
   (inc count)
   count

(let [ifs (FileInputStream. /dev/urandom)
  buf (make-array Byte/TYPE *numbytes*)]
  (dotimes [_ 10]
(let [sz (.read ifs buf)]
  (println Wanted *numbytes* got sz bytes)
  (let [count (time (countnl buf))]
(println Got count nls)


Iterate.java:

import java.io.FileInputStream;

class Iterate
{
  static final int NUMBYTES = 16*1024*1024;

  static int countnl(byte[] buf)
  {
int count = 0;
for(int i = 0; i  buf.length; i++) {
  if(buf[i] == '\n') {
count++;
  }
}
return count;
  }

  public static final void main(String[] args)
throws Throwable
  {
FileInputStream input = new FileInputStream(/dev/urandom);
byte[] buf = new byte[NUMBYTES];
int sz;
long start, end;

for(int i = 0; i  10; i++) {
  sz = input.read(buf);
  System.out.println(Wanted  + NUMBYTES +  got  + sz +  bytes);
  start = System.currentTimeMillis();
  int count = countnl(buf);
  end = System.currentTimeMillis();
  System.out.println(counted  + count +  nls in  +
  (end-start) +  msec);
}

input.close();
  }
}

iterate.c:

#includesys/types.h
#includesys/stat.h
#includesys/time.h
#includestdlib.h
#includeunistd.h
#includestdio.h
#includefcntl.h

int countnl(char *buf, int sz)
{
  int i;
  int count = 0;
  for(i = 0; i  sz; i++) {
if(buf[i] == '\n') {
  count++;
}
  }
  return count;
}

int main()
{
  int fd = open(/dev/urandom, O_RDONLY);
  const int NUMBYTES = 16*1024*1024;
  char *buf = (char*)malloc(NUMBYTES);

  int sz;
  struct timeval start, end;

  int i;
  for(i = 0; i  10; i++) {
sz = read(fd, buf, NUMBYTES);
printf(Wanted %d bytes, got %d bytes\n, NUMBYTES, sz);
gettimeofday(start, 0);
int count = countnl(buf, sz);
gettimeofday(end, 0);
printf(counted %d nls in %f msec\n, count,
(float)(end.tv_sec-start.tv_sec)*1e3 + (end.tv_usec-start.tv_usec)/1e3);
  }

  free(buf);
  close(fd);
  return 0;
}

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Help speed up an inner loop?

2010-08-30 Thread Robert McIntyre
I don't know what the heck is going here, but ignore the time the
program is reporting and just
pay attention to how long it actually takes wall-clock style and
you'll see that your clojure and
java programs already take the same time.

Here are my findings:

I saved Iterate.java into my rlm package and ran:
time java -server rlm.Iterate

results:
time java -server rlm.Iterate
Wanted 16777216 got 16777216 bytes
counted 65341 nls in 27 msec
Wanted 16777216 got 16777216 bytes
counted 65310 nls in 27 msec
Wanted 16777216 got 16777216 bytes
counted 66026 nls in 21 msec
Wanted 16777216 got 16777216 bytes
counted 65473 nls in 19 msec
Wanted 16777216 got 16777216 bytes
counted 65679 nls in 19 msec
Wanted 16777216 got 16777216 bytes
counted 65739 nls in 19 msec
Wanted 16777216 got 16777216 bytes
counted 65310 nls in 21 msec
Wanted 16777216 got 16777216 bytes
counted 65810 nls in 18 msec
Wanted 16777216 got 16777216 bytes
counted 65531 nls in 21 msec
Wanted 16777216 got 16777216 bytes
counted 65418 nls in 21 msec

real0m27.469s
user0m0.472s
sys 0m26.638s


I wrapped the last bunch of commands in your clojure script into a
(run) function:
(defn run []
  (let [ifs (FileInputStream. /dev/urandom)
buf (make-array Byte/TYPE *numbytes*)]
(dotimes [_ 10]
  (let [sz (.read ifs buf)]
(println Wanted *numbytes* got sz bytes)
(let [count (time (countnl buf))]
  (println Got count nls))

and ran
(time (run)) at the repl:

(time (run))
Wanted 16777216 got 16777216 bytes
Elapsed time: 183.081975 msecs
Got 65894 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 183.001814 msecs
Got 65949 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 183.061934 msecs
Got 65603 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 183.031131 msecs
Got 65563 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 183.122567 msecs
Got 65696 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 182.968066 msecs
Got 65546 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 183.058508 msecs
Got 65468 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 182.932395 msecs
Got 65872 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 183.074646 msecs
Got 65498 nls
Wanted 16777216 got 16777216 bytes
Elapsed time: 187.733636 msecs
Got 65434 nls
Elapsed time: 28510.331507 msecs
nil

Total running time for both programs is around 28 seconds.
The java program seems to be incorrectly reporting it's time.

--Robert McIntyre









On Mon, Aug 30, 2010 at 4:03 PM, tsuraan tsur...@gmail.com wrote:
 Just to try to see if clojure is a practical language for doing
 byte-level work (parsing files, network streams, etc), I wrote a
 trivial function to iterate through a buffer of bytes and count all
 the newlines that it sees.  For my testing, I've written a C version,
 a Java version, and a Clojure version.  I'm running each routine 10
 times over a 16MB buffer read from /dev/urandom (the buffer is
 refreshed between each call to the newline counting function).  With
 gcc -O0, I get about 80ms per 16MB buffer.  With gcc -O3, I get ~14ms
 per buffer.  With javac (and java -server) I get 20ms per 16MB buffer.
  With clojure, I get 105ms per buffer (after the jvm warms up).  I'm
 guessing that the huge boost that java and gcc -O3 get is from
 converting per-byte operations to per-int ops; at least that ~4x boost
 looks like it would come from something like that.  Is that an
 optimization that is unavailable to clojure?  The java_interop doc
 makes it sound like java and clojure get the exact same bytecode when
 using areduce correctly, so maybe there's something I could be doing
 better.  Here are my small programs; if somebody could suggest
 improvements, I'd appreciate them.

 iterate.clj:

 (set! *warn-on-reflection* true)
 (import java.io.FileInputStream)

 (def *numbytes* (* 16 1024 1024))

 (defn countnl
  [#^bytes buf]
  (let [nl (byte 10)]
    (areduce buf idx count 0
             (if (= (aget buf idx) nl)
               (inc count)
               count

 (let [ifs (FileInputStream. /dev/urandom)
      buf (make-array Byte/TYPE *numbytes*)]
  (dotimes [_ 10]
    (let [sz (.read ifs buf)]
      (println Wanted *numbytes* got sz bytes)
      (let [count (time (countnl buf))]
        (println Got count nls)


 Iterate.java:

 import java.io.FileInputStream;

 class Iterate
 {
  static final int NUMBYTES = 16*1024*1024;

  static int countnl(byte[] buf)
  {
    int count = 0;
    for(int i = 0; i  buf.length; i++) {
      if(buf[i] == '\n') {
        count++;
      }
    }
    return count;
  }

  public static final void main(String[] args)
    throws Throwable
  {
    FileInputStream input = new FileInputStream(/dev/urandom);
    byte[] buf = new byte[NUMBYTES];
    int sz;
    long start, end;

    for(int i = 0; i  10; i++) {
      sz = input.read(buf);
      System.out.println(Wanted  + NUMBYTES +  got  + sz +  bytes);
      start = System.currentTimeMillis();
      int count = countnl(buf);

Re: Help speed up an inner loop?

2010-08-30 Thread Alan
I think this misses the point. Of course java, c, and clojure will all
have roughly the same wall-clock time for this program, since it is
dominated by the I/O. You can even see that in the output from $ time
java Iterate: less than 0.5s was spent in user space, the rest was
spent in system code - that is, mostly doing I/O.

The java version is a second faster as counted by the wall clock, and
this is unlikely to be a coincidence: tsuraan's timing data suggests
that the clojure program takes 80ms longer in each loop, and loops 10
times. That comes out to 0.8 seconds, which is quite close to the
differential you observed when timing from the command line.

On Aug 30, 1:38 pm, Robert McIntyre r...@mit.edu wrote:
 I don't know what the heck is going here, but ignore the time the
 program is reporting and just
 pay attention to how long it actually takes wall-clock style and
 you'll see that your clojure and
 java programs already take the same time.

 Here are my findings:

 I saved Iterate.java into my rlm package and ran:
 time java -server rlm.Iterate

 results:
 time java -server rlm.Iterate
 Wanted 16777216 got 16777216 bytes
 counted 65341 nls in 27 msec
 Wanted 16777216 got 16777216 bytes
 counted 65310 nls in 27 msec
 Wanted 16777216 got 16777216 bytes
 counted 66026 nls in 21 msec
 Wanted 16777216 got 16777216 bytes
 counted 65473 nls in 19 msec
 Wanted 16777216 got 16777216 bytes
 counted 65679 nls in 19 msec
 Wanted 16777216 got 16777216 bytes
 counted 65739 nls in 19 msec
 Wanted 16777216 got 16777216 bytes
 counted 65310 nls in 21 msec
 Wanted 16777216 got 16777216 bytes
 counted 65810 nls in 18 msec
 Wanted 16777216 got 16777216 bytes
 counted 65531 nls in 21 msec
 Wanted 16777216 got 16777216 bytes
 counted 65418 nls in 21 msec

 real    0m27.469s
 user    0m0.472s
 sys     0m26.638s

 I wrapped the last bunch of commands in your clojure script into a
 (run) function:
 (defn run []
   (let [ifs (FileInputStream. /dev/urandom)
         buf (make-array Byte/TYPE *numbytes*)]
     (dotimes [_ 10]
       (let [sz (.read ifs buf)]
         (println Wanted *numbytes* got sz bytes)
         (let [count (time (countnl buf))]
           (println Got count nls))

 and ran
 (time (run)) at the repl:

 (time (run))
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.081975 msecs
 Got 65894 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.001814 msecs
 Got 65949 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.061934 msecs
 Got 65603 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.031131 msecs
 Got 65563 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.122567 msecs
 Got 65696 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 182.968066 msecs
 Got 65546 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.058508 msecs
 Got 65468 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 182.932395 msecs
 Got 65872 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 183.074646 msecs
 Got 65498 nls
 Wanted 16777216 got 16777216 bytes
 Elapsed time: 187.733636 msecs
 Got 65434 nls
 Elapsed time: 28510.331507 msecs
 nil

 Total running time for both programs is around 28 seconds.
 The java program seems to be incorrectly reporting it's time.

 --Robert McIntyre

 On Mon, Aug 30, 2010 at 4:03 PM, tsuraan tsur...@gmail.com wrote:
  Just to try to see if clojure is a practical language for doing
  byte-level work (parsing files, network streams, etc), I wrote a
  trivial function to iterate through a buffer of bytes and count all
  the newlines that it sees.  For my testing, I've written a C version,
  a Java version, and a Clojure version.  I'm running each routine 10
  times over a 16MB buffer read from /dev/urandom (the buffer is
  refreshed between each call to the newline counting function).  With
  gcc -O0, I get about 80ms per 16MB buffer.  With gcc -O3, I get ~14ms
  per buffer.  With javac (and java -server) I get 20ms per 16MB buffer.
   With clojure, I get 105ms per buffer (after the jvm warms up).  I'm
  guessing that the huge boost that java and gcc -O3 get is from
  converting per-byte operations to per-int ops; at least that ~4x boost
  looks like it would come from something like that.  Is that an
  optimization that is unavailable to clojure?  The java_interop doc
  makes it sound like java and clojure get the exact same bytecode when
  using areduce correctly, so maybe there's something I could be doing
  better.  Here are my small programs; if somebody could suggest
  improvements, I'd appreciate them.

  iterate.clj:

  (set! *warn-on-reflection* true)
  (import java.io.FileInputStream)

  (def *numbytes* (* 16 1024 1024))

  (defn countnl
   [#^bytes buf]
   (let [nl (byte 10)]
     (areduce buf idx count 0
              (if (= (aget buf idx) nl)
                (inc count)
                count

  (let [ifs (FileInputStream. /dev/urandom)
       buf (make-array Byte/TYPE *numbytes*)]
   (dotimes [_ 10]
     (let [sz (.read ifs