Re: PicoLisp and Arrays

dexen deVries Mon, 21 Feb 2011 12:03:07 -0800

Hi list,

On Monday 21 of February 2011 19:58:32 you wrote:
> in recent IRC discussions, the old requests (demands) for array support
> in PicoLisp rose their ugly heads again.
> 
> I'm a bit disappointed about that, as it (or, better, its absence) is a
> central issue, and it shows that I couldn't make clear at all what
> PicoLisp is about.
> 
> So I felt I should try to explain my view more clearly, and wrote a
> little article in the wiki. It is a bit too short, perhaps, and perhaps
> even increases the confusion :(
> 
> But nevertheless, I posted it. It might be improved in the future, and
> can be found under
> 
>    Articles & Essays  ->  Array Abstinence
> 
> or at the direct link "http://picolisp.com/5000/-2-1d.html";.



The article lays it out all clearly and neatly, with one caveat I'd like to 
mention.

<rant mode on>
Alex writes, ``Given a number, the associated data can be found [in array] in 
O(1) time.'' 

That holds true if, and only if, the relevant part of the array is already 
present in L1 cache. Otherwise, about 200...300 CPU cycles [1] will be wasted 
waiting for the cache to be filled before data is available to the CPU. This 
wasted time actually dominates the execution time, in pathological cases.

The data will be in cache most of the time for small arrays, but will NOT be 
in cache for larger arrays. So if you think ``for large ammouts of data arrays 
will be significantly faster'', don't -- there's significant overhead related 
to 
cache loads. To the contrary, traversing PicoLisp's linked list will be quite 
fast.

Now if you access the array in a semi-random order, so-called `cache trashing' 
will ensue. Pretty much like reading random data from the harddrive (for 
example from swap). The CPU, starved of data, will idle uselessly.

On the other hand, if you access the array it sequentially, it's not better 
than simply traversing the PicoLisp linked list. You can even reasonably 
expect the CPU to pre-fetch relevant data into the cache.

If you really, positively, want to index a huge number of data, use a binary 
search tree with complexity of O(log n) on average. Building BSTs from 
PicoLisp lists is easy :)
</rant mode off>


To wrap it up: surprisingly, due to modern memory/CPU speed disparity, random 
access to data in an array isn't all that fast. Theoretically O(1), but in 
practice it will incur high overhead of cache misses.
Traversing a list is fast enough unless you have huge datasets. In that case, 
use a binary search tree.


[1] http://lwn.net/Articles/252125/
(& some other publication by some Sun guy)

-- 
dexen deVries


``One can't proceed from the informal to the formal by formal means.''
-- 
UNSUBSCRIBE: mailto:[email protected]?subject=Unsubscribe

Re: PicoLisp and Arrays

Reply via email to