Re: PicoLisp and Arrays
Hi Kazimir, > one can use symbols with names like > L1,...,L9 > ... > For a comparison, access to list is much slower: > ... > (bench (do 100 (inc (nth List 5 > 149.812 sec > ... > Generation of the symbol names has its price, > however, for lot of data, it is still faster > than list access. Yes, this is a valid approach. Basically you use the symbol lookup mechanism of 'intern'. But I see several problems with that: 1. It clobbers the internal symbol name space. 2. There is the danger of conflict with existing symbols 3. The structure can only be cleaned up by calling 'zap' for each symbol 4. If you have many such structures, they will all use the same tree There is a direct way to achieve something similar: The 'idx' mechanism, using a binary tree. For example, creating a tree of 5 random "array" entries, each initialized to zero: (off Tree) (do 5 (idx 'Tree (def (format (rand 1 5)) 0) T ) ) Then, incrementing a million random entries: (bench (do 100 (inc (car (idx 'Tree (format (rand 1 5) ) ) 2.650 sec This doesn't clobber any global structures. Instead of (off Tree), you would say (let Tree NIL ... ) and the whole tree will become garbage as soon as the 'let' body terminates. Cheers, - Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
Hi. O(n) vs O(1) is large difference, however, there is rather simple way out of the problem: instead of one vector L with indexes 1,...,9 L[1],...,L[9] one can use symbols with names like L1,...,L9 If Picolisp is fast enough with symbols, i.e. symbol access time is O(log n), then - (bench (for (i 1 (<= i 9)(inc i)) (set (intern (pack 'L i)) i))) 0.292 sec -> 9 (bench (do 100 (inc 'L5))) 0.035 sec -> 105 - For a comparison, access to list is much slower: - (bench (do 100 (inc (nth List 5 149.812 sec - Generation of the symbol names has its price, however, for lot of data, it is still faster than list access. -- (bench (do 100 (inc (intern (pack 'L 10001) 2.620 sec -> 1010001 (bench (do 100 (inc (intern (pack 'L 8) 3.016 sec -> 108 -- The code is somehow ugly, but I guess that with few helper functions, it can be prettier. Furthermore, I think this approach is good Lisp style, because I adapted symbols to array-like use; on similar way, symbols can be adapted to other data structures, without need for intervention of the implementer. Am I right? Is it what Henrik wrote? -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
On Tuesday 22 of February 2011 10:26:44 you wrote: > On Tue, Feb 22, 2011 at 08:31:43AM +0100, Alexander Burger wrote: > > So, again, thanks to you all for the support! I don't have to defend > > that design decision in the future again! :) >=20 > With new users, this will come up over and over again. Perhaps a visual analogy would come in hand too? ``When switching windows on your desktop, do you use Alt+Tab, or do you=20 indixate explicitly on the exact window you want to pop up?'' where Alt+Tab denotes list trarversal, and indicating windows denotes index= ed=20 array dereference. =2D-=20 dexen deVries [[[=E2=86=93][=E2=86=92]]] > how does a C compiler get to be that big? what is all that code doing? iterators, string objects, and a full set of C macros that ensure boundary conditions and improve interfaces. ron minnich, in response to Charles Forsyth http://9fans.net/archive/2011/02/90 -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
On Tue, Feb 22, 2011 at 11:06:57AM +0100, Jon Kleiser wrote: > >But now there is sort of a FAQ to point to. > > > >// Jakob > > > This is the FAQ: http://www.software-lab.de/doc/faq.html#arrays Oh, indeed! I forgot about that one. Good to know ;-) Cheers, - Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
On 22-02-11 10:26 , Jakob Eriksson wrote: On Tue, Feb 22, 2011 at 08:31:43AM +0100, Alexander Burger wrote: So, again, thanks to you all for the support! I don't have to defend that design decision in the future again! :) With new users, this will come up over and over again. But now there is sort of a FAQ to point to. // Jakob This is the FAQ: http://www.software-lab.de/doc/faq.html#arrays ;-) /Jon -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
On Tue, Feb 22, 2011 at 08:31:43AM +0100, Alexander Burger wrote: > > So, again, thanks to you all for the support! I don't have to defend > that design decision in the future again! :) With new users, this will come up over and over again. But now there is sort of a FAQ to point to. // Jakob -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
Thank you all for your comments! I'm glad to learn that I was too pessimistic :) On Mon, Feb 21, 2011 at 09:25:45PM +0100, Tomas Hlavaty wrote: > well, abstinence often leads to frustration;-) .. or to enlightenment ;-) > You mention a few times "wasting tag bits". Isn't the other and maybe > more important point that arrays would not work with the way current > garbage collection works? Basically, the heap would become > non-homogeneous and fragmented, breaking the beauty, simplicity and > efficiency of current heap management. Well, the garbage collector could easily be extended to know about arrays. But, yes, fragmentation was the main reason for the single-cell design (especially in the initial versions, when the gc used a compacting algorithm). I avoided most argumentation _pro_ lists to keep the article short, and packed it into the fuzzy statement "... any Lisp programmer will know them". On Tue, Feb 22, 2011 at 10:43:55AM +0700, Henrik Sarvell wrote: > I don't understand what the problem is, as far as array functionality > goes one can use (get), and for hash functionality (assoc) or (idx), > did I miss something? Not at all. This was exactly the point. As long as we have 'get' and other list functions, we can simulate the behavior of arrays. So, again, thanks to you all for the support! I don't have to defend that design decision in the future again! :) Cheers, - Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
I don't understand what the problem is, as far as array functionality goes one can use (get), and for hash functionality (assoc) or (idx), did I miss something? During development of http://vizreader.com which is quite a big application, I never wished I had either native arrays or hashes. However, some features would have been impossible to implement without the index tree functionality, the below method is a good example: (dm words> (L) (let Words NIL (for W L (and (setq W (lowc (pack W))) (not (common?> This W)) (if (idx 'Words W T) (inc (car @)) (set W 1 (idx 'Words))) It will count all words in an article by using the word as the "key" in hash terminology and the amount of times said word appears as the "value". The information is later stored as a +Blob and used for NLP as well as the full word search which I wrote about earlier here: http://picolisp.com/5000/-2-I.html I mean of course I could have used (assoc) in theory but not practically. On Tue, Feb 22, 2011 at 4:22 AM, dexen deVries wr= ote: > On Monday 21 of February 2011 21:54:17 you wrote: >> Hi Dexen & Jos=3DE9, >> >> >> Now if you access the array in a semi-random order, so-called `cache >> >> trashing' will ensue. Pretty much like reading random data from the >> >> harddrive (for example from swap). The CPU, starved of data, will >> >> idle uselessly. >> >> this is quite a low level detail and the effect on lists is the same. >> The PicoLisp lists live on the heap which is an array of cells in the C >> terms after all. =A0However, for lists this happens more often, dependin= g >> on how the list came into existence because the data will be quite >> likely spread over many different memory pages. > > I'm not much experienced in details, but I'd guess `yes and no' ;) > > A `naive' implementation of compiler would emit code that *just* operates= on > the data, and a`naive' CPU would *just* execute explicit instructions. > > However, modern CPUs have instructions for data pre-fetch & prefetching h= ints > (for explicit indication of prefetch) and a competent compiler can be exp= ected > to include such ones in proper places. Also, the CPU performs some analys= is of > all opcodes & is free to pre-fetch any data & code it considers likely to= be > used in future -- up to and including speculative execution of future > instructions. > > In the end, I'd guess walking a (linked) list can be more explicit indica= tor > for CPU to get the right data pre-fetched. > > But then again, that's just speculative fiction :) > > Greetz, > -- > dexen deVries > > ``One can't proceed from the informal to the formal by formal means.'' > -- > UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=3DUnsubscribe > -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
On Monday 21 of February 2011 21:54:17 you wrote: > Hi Dexen & Jos=E9, > > >> Now if you access the array in a semi-random order, so-called `cache > >> trashing' will ensue. Pretty much like reading random data from the > >> harddrive (for example from swap). The CPU, starved of data, will > >> idle uselessly. > > this is quite a low level detail and the effect on lists is the same. > The PicoLisp lists live on the heap which is an array of cells in the C > terms after all. However, for lists this happens more often, depending > on how the list came into existence because the data will be quite > likely spread over many different memory pages. I'm not much experienced in details, but I'd guess `yes and no' ;) A `naive' implementation of compiler would emit code that *just* operates on the data, and a`naive' CPU would *just* execute explicit instructions. However, modern CPUs have instructions for data pre-fetch & prefetching hints (for explicit indication of prefetch) and a competent compiler can be expected to include such ones in proper places. Also, the CPU performs some analysis of all opcodes & is free to pre-fetch any data & code it considers likely to be used in future -- up to and including speculative execution of future instructions. In the end, I'd guess walking a (linked) list can be more explicit indicator for CPU to get the right data pre-fetched. But then again, that's just speculative fiction :) Greetz, -- dexen deVries ``One can't proceed from the informal to the formal by formal means.'' -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
Hi Dexen & Jos=E9, >> Now if you access the array in a semi-random order, so-called `cache >> trashing' will ensue. Pretty much like reading random data from the >> harddrive (for example from swap). The CPU, starved of data, will >> idle uselessly. this is quite a low level detail and the effect on lists is the same. The PicoLisp lists live on the heap which is an array of cells in the C terms after all. However, for lists this happens more often, depending on how the list came into existence because the data will be quite likely spread over many different memory pages. > seem to favor arrays is not efficiency, but that some algorithms > deeply wired into our brains are more clearly represented using arrays > instead of single linked lists. Agree, completely. Cheers, Tomas -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
On Mon, Feb 21, 2011 at 16:49, dexen deVries wrot= e: > Hi list, > > On Monday 21 of February 2011 19:58:32 you wrote: >> =A0 =A0Articles & Essays =A0-> =A0Array Abstinence >> >> or at the direct link "http://picolisp.com/5000/-2-1d.html";. > > > The article lays it out all clearly and neatly, with one caveat I'd like = to > mention. > > > Alex writes, ``Given a number, the associated data can be found [in array= ] in > O(1) time.'' > > That holds true if, and only if, the relevant part of the array is alread= y > present in L1 cache. Otherwise, about 200...300 CPU cycles [1] will be wa= sted > waiting for the cache to be filled before data is available to the CPU. T= his > wasted time actually dominates the execution time, in pathological cases. > > The data will be in cache most of the time for small arrays, but will NOT= be > in cache for larger arrays. So if you think ``for large ammouts of data a= rrays > will be significantly faster'', don't -- there's significant overhead rel= ated to > cache loads. To the contrary, traversing PicoLisp's linked list will be q= uite > fast. > > Now if you access the array in a semi-random order, so-called `cache tras= hing' > will ensue. Pretty much like reading random data from the harddrive (for > example from swap). The CPU, starved of data, will idle uselessly. > > On the other hand, if you access the array it sequentially, it's not bett= er > than simply traversing the PicoLisp linked list. You can even reasonably > expect the CPU to pre-fetch relevant data into the cache. > > If you really, positively, want to index a huge number of data, use a bin= ary > search tree with complexity of O(log n) on average. Building BSTs from > PicoLisp lists is easy :) > > > > To wrap it up: surprisingly, due to modern memory/CPU speed disparity, ra= ndom > access to data in an array isn't all that fast. Theoretically O(1), but i= n > practice it will incur high overhead of cache misses. > Traversing a list is fast enough unless you have huge datasets. In that c= ase, > use a binary search tree. > > > [1] http://lwn.net/Articles/252125/ > (& some other publication by some Sun guy) > :D that should be in the essay! anyway, one of the reasons people (me included) seem to favor arrays is not efficiency, but that some algorithms deeply wired into our brains are more clearly represented using arrays instead of single linked lists. But that's just matter of being used to a way of thinking, i guess we just need to learn how to write clear code using lists instead of arrays, perhaps favoring other algorithms, Alex's work on RosettaCode helps a lot with that :], I surprised myself when i could code that BF interpreter using only lists with fairly concise and clear code. > -- > dexen deVries Cheers, Jos=E9 -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
Hi Alex, >Articles & Essays -> Array Abstinence well, abstinence often leads to frustration;-) > So I felt I should try to explain my view more clearly, and wrote a > little article in the wiki. It is a bit too short, perhaps, and > perhaps even increases the confusion :( Nice article. > Regarding all that, it should become clear that wasting tag bits and > efforts for such a data type and its associated functions is not a > good idea. You mention a few times "wasting tag bits". Isn't the other and maybe more important point that arrays would not work with the way current garbage collection works? Basically, the heap would become non-homogeneous and fragmented, breaking the beauty, simplicity and efficiency of current heap management. > If there is a choice between arrays or lists, lists will clearly win. That's rather subjective and vague assertion in the middle of the article, maybe it should be at the end as a conclusion;-) Cheers, Tomas -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
Hi list, On Monday 21 of February 2011 19:58:32 you wrote: > in recent IRC discussions, the old requests (demands) for array support > in PicoLisp rose their ugly heads again. > > I'm a bit disappointed about that, as it (or, better, its absence) is a > central issue, and it shows that I couldn't make clear at all what > PicoLisp is about. > > So I felt I should try to explain my view more clearly, and wrote a > little article in the wiki. It is a bit too short, perhaps, and perhaps > even increases the confusion :( > > But nevertheless, I posted it. It might be improved in the future, and > can be found under > >Articles & Essays -> Array Abstinence > > or at the direct link "http://picolisp.com/5000/-2-1d.html";. The article lays it out all clearly and neatly, with one caveat I'd like to mention. Alex writes, ``Given a number, the associated data can be found [in array] in O(1) time.'' That holds true if, and only if, the relevant part of the array is already present in L1 cache. Otherwise, about 200...300 CPU cycles [1] will be wasted waiting for the cache to be filled before data is available to the CPU. This wasted time actually dominates the execution time, in pathological cases. The data will be in cache most of the time for small arrays, but will NOT be in cache for larger arrays. So if you think ``for large ammouts of data arrays will be significantly faster'', don't -- there's significant overhead related to cache loads. To the contrary, traversing PicoLisp's linked list will be quite fast. Now if you access the array in a semi-random order, so-called `cache trashing' will ensue. Pretty much like reading random data from the harddrive (for example from swap). The CPU, starved of data, will idle uselessly. On the other hand, if you access the array it sequentially, it's not better than simply traversing the PicoLisp linked list. You can even reasonably expect the CPU to pre-fetch relevant data into the cache. If you really, positively, want to index a huge number of data, use a binary search tree with complexity of O(log n) on average. Building BSTs from PicoLisp lists is easy :) To wrap it up: surprisingly, due to modern memory/CPU speed disparity, random access to data in an array isn't all that fast. Theoretically O(1), but in practice it will incur high overhead of cache misses. Traversing a list is fast enough unless you have huge datasets. In that case, use a binary search tree. [1] http://lwn.net/Articles/252125/ (& some other publication by some Sun guy) -- dexen deVries ``One can't proceed from the informal to the formal by formal means.'' -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: PicoLisp and Arrays
On Tue, Feb 22, 2011 at 2:58 AM, Alexander Burger wro= te: > Hi all, > > in recent IRC discussions, the old requests (demands) for array support > in PicoLisp rose their ugly heads again. > > I'm a bit disappointed about that, as it (or, better, its absence) is a > central issue, and it shows that I couldn't make clear at all what > PicoLisp is about. > > So I felt I should try to explain my view more clearly, and wrote a > little article in the wiki. It is a bit too short, perhaps, and perhaps > even increases the confusion :( > > But nevertheless, I posted it. It might be improved in the future, and > can be found under > > =C2=A0 Articles & Essays =C2=A0-> =C2=A0Array Abstinence > > or at the direct link "http://picolisp.com/5000/-2-1d.html";. > when i wear my C hat, i think arrays. when i wear my lisp hat (not a very big hat at the moment), i think lists. making it easier for myself, lists *are* arrays. in lists however, order is inherent. usage is the same, level of abstraction is different. must be the math training. :) my 2 cents. Ana --=20 http://nybl.info -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe