timespacex 'c1 T' 0.0547916 6.81632e7 timespacex 'c2 T' 0.0286093 2.62189e7
There is 1/3 space. What is amazing is that it seems like any change at all to your 4 liner (c1) makes its timings equal to the other approach. that includes changing b and c lines to b =. y ,: 2 + 6 * y c =. a } b or putting the shift command on same line as c. this version even slower: c6=: 3 : 0 a=. 2 | y c=. a } (,: 2 + 6&*) y (_1 & (33 b.)) c ) timespacex 'c6 T' 0.0653142 4.29966e7 even this is no help: c7=: (_1 & (33 b.))@:(3 : 0) a=. 2 | y b =. 2 + 6 * y a } y ,: b ) c3 =: _1 & (33 b.)@:(3 : ' (2 | y) } y ,: (2 + 6 * y)') ia =: 4 : 'x } y' c4 =: _1 & (33 b.)@:(2&| ia ] ,: 2 +6&*) c5 =: (_1 & (33 b.)) @:(3 : '(2 | y) } y ,: 2 + 6 * y') I can only see 2 explanations for this: 1. there is special code for boolist } y ,: alty 2. automagical cache or register alignment that can happen by ultrasimple j expressions where temp variables somehow get cached while the stack doesn't. 2 seems much more likely, because special code would likely apply to (,: u) as well, and the huge memory use drop suggests that sometimes temp variables are optimized if simple enough. It might be an unintentional gcc optimization. though everything gets much more equal with smaller arguments at rank: c3 =: _1 & (33 b.)@:(3 : ' (2 | y) } y ,: (2 + 6 * y)') c4 =: _1 & (33 b.)@:(2&| ia ] ,: 2 +6&*) c5 =: (_1 & (33 b.)) @:(3 : '(2 | y) } y ,: 2 + 6 * y') timespacex 'c4"1 ] 1000 1000 $ T' 0.0316204 1.68375e7 timespacex 'c3"1 ] 1000 1000 $ T' 0.0315689 1.68585e7 timespacex 'c5"1 ] 1000 1000 $ T' 0.0338553 1.68582e7 timespacex 'c1"1 ] 1000 1000 $ T' 0.0342771 1.6868e7 timespacex 'c7"1 ] 1000 1000 $ T' 0.0330486 1.68502e7 timespacex 'c2"1 ] 1000 1000 $ T' 0.031431 1.68257e7 including no major memory differences. 1M items happens to be a magical number in terms of cache use. The fact that applying verbs 1000 times to 1/1000th the arguments is faster is strong suggestion that cpu cache is the factor at play. This would seem to be the same effect as my recent post on benchmarks. there is still a noticeable advantage for larger chunks though including memory, and even faster than whole array operation for c2. 20 timespacex 'c2"1 ] 100 10000 $ T' 0.0243965 1.74545e7 20 timespacex 'c1"1 ] 100 10000 $ T' 0.0271633 1.81111e7 20 timespacex 'c7"1 ] 100 10000 $ T' 0.0270292 1.78476e7 gets slower with larger chunks. 20 timespacex 'c1"1 ] 10 100000 $ T' 0.0694971 2.74008e7 20 timespacex 'c2"1 ] 10 100000 $ T' 0.0428581 2.21567e7 perhaps above 2 show differences between fitting in L2 vs L3 cache? at smaller chunks the mostly tacit version comes out slightly ahead 20 timespacex 'c4"1 ] 1000 1000 $ T' 0.0319809 1.68375e7 20 timespacex 'c2"1 ] 1000 1000 $ T' 0.0333608 1.68257e7 20 timespacex 'c1"1 ] 1000 1000 $ T' 0.0346984 1.6868e7 ----- Original Message ----- From: Roger Stokes <[email protected]> To: [email protected] Cc: Sent: Thursday, August 7, 2014 6:02:52 AM Subject: Re: [Jprogramming] Memoizing (Project Euler problem 14) Dear All, I've come across a puzzle and would be glad if someone could explain it. In a few words, I have a variation of collatzv which is a one-liner (c1 below) and the very same verb spread over several lines (c2 below). The second is nearly twice as fast as the first. How come? The timings I get are collatzv T 0.0592513 c1 T NB. one-liner 0.0582203 c2 T NB. spread out 0.0352435 where T is 2 + i. 1e6 What follows is transcript of session producing these results. Starting with Roger Hui's original collatzv: collatzv =: 3 : '<. (2|y)} 0 1 + 0.5 3 */y' NB. here is my one-liner, an experiment intended to stick to NB. integer arithmetic throughout. c1 =: 3 : '(_1 & (33 b.)) (2 | y) } y ,: (2 + 6 * y)' NB. here is a spread-out version c2 =: 3 : 0 a =. 2 | y b =. 2 + 6 * y c =. a } y ,: b (_1 & (33 b.)) c ) (collatzv -: c1) T =: 2 + i. 1e6 1 (collatzv -: c2) T 1 NB. comparing timings: compare =: (; (6!:2)) @: > "0 compare 'collatzv T '; 'c1 T NB. one-liner '; 'c2 T NB. spread out' collatzv T 0.0587746 c1 T NB. one-liner 0.0579726 c2 T NB. spread out 0.0340378 NB. Do we get the same effect by spreading collatzv? No. collatzvs =: 3 : 0 a =. 0 1 + 0.5 3 */y b =. 2|y c =: b } a <. c ) (collatzv -: collatzvs) T 1 compare 'collatzv T '; 'c1 T '; 'c2 T '; 'collatzvs T ' collatzv T 0.0591852 c1 T 0.0594112 c2 T 0.034516 collatzvs T 0.0597546 JVERSION Engine: j701/2011-01-10/11:25 Library: 8.02.10 Qt IDE: 1.1.3/5.3.0 Platform: Win 64 Installer: J802 install InstallPath: c:/users/homer/j64-802 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
