timespacex 'c1 T' 
0.0547916 6.81632e7 
   timespacex 'c2 T' 
0.0286093 2.62189e7

There is 1/3 space.

What is amazing is that it seems like any change at all to your 4 liner (c1) 
makes its timings equal to the other approach.

that includes changing b and c lines to 

b =. y ,: 2 + 6 * y 
c =. a }  b

or putting the shift command on same line as c.

this version even slower:
c6=:  3 : 0 
 a=. 2 | y 
 c=. a } (,: 2 + 6&*) y 
 (_1 & (33 b.)) c 
) 
   timespacex 'c6 T' 
0.0653142 4.29966e7 

even this is no help:

c7=:  (_1 & (33 b.))@:(3 : 0) 
 a=. 2 | y 
 b =. 2 + 6 * y 
 a } y ,: b 
) 

c3 =: _1 & (33 b.)@:(3 : ' (2 | y) }  y ,: (2 + 6 * y)') 
ia =: 4 : 'x } y'
c4 =: _1 & (33 b.)@:(2&| ia ] ,: 2 +6&*) 
   c5 =: (_1 & (33 b.)) @:(3 : '(2 | y) }  y ,: 2 + 6 * y') 

I can only see 2 explanations for this:

1. there is special code for boolist } y ,: alty
2. automagical cache or register alignment that can happen by ultrasimple j 
expressions where temp variables somehow get cached while the stack doesn't.

2 seems much more likely, because special code would likely apply to (,: u) as 
well, and the huge memory use drop suggests that sometimes temp variables are 
optimized if simple enough.  It might be an unintentional gcc optimization.

though everything gets much more equal with smaller arguments at rank:

c3 =: _1 & (33 b.)@:(3 : ' (2 | y) }  y ,: (2 + 6 * y)') 
c4 =: _1 & (33 b.)@:(2&| ia ] ,: 2 +6&*) 
c5 =: (_1 & (33 b.)) @:(3 : '(2 | y) }  y ,: 2 + 6 * y') 

   timespacex 'c4"1 ] 1000 1000 $ T' 
0.0316204 1.68375e7 
   timespacex 'c3"1 ] 1000 1000 $ T' 
0.0315689 1.68585e7 
   timespacex 'c5"1 ] 1000 1000 $ T' 
0.0338553 1.68582e7 
   timespacex 'c1"1 ] 1000 1000 $ T' 
0.0342771 1.6868e7 
   timespacex 'c7"1 ] 1000 1000 $ T' 
0.0330486 1.68502e7 
     timespacex 'c2"1 ] 1000 1000 $ T' 
0.031431 1.68257e7 

including no major memory differences.

1M items happens to be a magical number in terms of cache use.  The fact that 
applying verbs 1000 times to 1/1000th the arguments is faster is strong 
suggestion that cpu cache is the factor at play.  This would seem to be the 
same effect as my recent post on benchmarks.

there is still a noticeable advantage for larger chunks though including 
memory, and even faster than whole array operation for c2.

   20 timespacex 'c2"1 ] 100 10000 $ T' 
0.0243965 1.74545e7 
   20 timespacex 'c1"1 ] 100 10000 $ T' 
0.0271633 1.81111e7 
   20 timespacex 'c7"1 ] 100 10000 $ T' 
0.0270292 1.78476e7 

gets slower with larger chunks.
   20 timespacex 'c1"1 ] 10 100000 $ T' 
0.0694971 2.74008e7 
   20 timespacex 'c2"1 ] 10 100000 $ T' 
0.0428581 2.21567e7 

perhaps above 2 show differences between fitting in L2 vs L3 cache?

at smaller chunks the mostly tacit version comes out slightly ahead

   20 timespacex 'c4"1 ] 1000 1000 $ T' 
0.0319809 1.68375e7 
   20 timespacex 'c2"1 ] 1000 1000 $ T' 
0.0333608 1.68257e7 
   20 timespacex 'c1"1 ] 1000 1000 $ T' 
0.0346984 1.6868e7 




----- Original Message -----
From: Roger Stokes <[email protected]>
To: [email protected]
Cc: 
Sent: Thursday, August 7, 2014 6:02:52 AM
Subject: Re: [Jprogramming] Memoizing (Project Euler problem 14)

Dear All,

I've come across a puzzle
and would be glad if someone could explain it.

In a few words, I have a variation of collatzv which is a one-liner
  (c1 below) and the very same verb spread over several lines (c2 below).
The second is nearly twice as fast as the first.  How come?

The timings I get are

  collatzv T          0.0592513

  c1 T NB. one-liner  0.0582203

  c2 T NB. spread out 0.0352435

where T is  2 + i. 1e6

What follows is transcript of session producing these results.

Starting with Roger Hui's original collatzv:


    collatzv =: 3 : '<. (2|y)} 0 1 + 0.5 3 */y'

    NB. here is my one-liner, an experiment intended to stick to
    NB. integer arithmetic throughout.

    c1 =: 3 : '(_1 & (33 b.)) (2 | y) }  y ,: (2 + 6 * y)'

    NB. here is a spread-out version

    c2 =: 3 : 0
   a =. 2 | y
   b =. 2 + 6 * y
   c =. a } y ,: b
   (_1 & (33 b.)) c
)

    (collatzv -: c1) T =: 2 + i. 1e6
1
    (collatzv -: c2) T
1

    NB. comparing timings:

    compare =: (; (6!:2)) @: > "0
    compare 'collatzv T '; 'c1 T NB. one-liner '; 'c2 T NB. spread out'

  collatzv T          0.0587746

  c1 T NB. one-liner  0.0579726

  c2 T NB. spread out 0.0340378


    NB. Do we get the same effect by spreading collatzv?  No.

    collatzvs =: 3 : 0
   a =. 0 1 + 0.5 3 */y
   b =. 2|y
   c =: b } a
   <. c
)

    (collatzv -: collatzvs) T
1

    compare  'collatzv T '; 'c1 T '; 'c2 T '; 'collatzvs T '

  collatzv T   0.0591852

  c1 T         0.0594112

  c2 T         0.034516

  collatzvs T  0.0597546


    JVERSION
Engine: j701/2011-01-10/11:25
Library: 8.02.10
Qt IDE: 1.1.3/5.3.0
Platform: Win 64
Installer: J802 install
InstallPath: c:/users/homer/j64-802





----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to