Re: PicoLisp and Arrays

2011-02-22 Thread Alexander Burger
Hi Kazimir,

> one can use symbols with names like
> L1,...,L9
> ...
> For a comparison, access to list is much slower:
> ...
> (bench (do 100 (inc (nth List 5
> 149.812 sec
> ...
> Generation of the symbol names has its price,
> however, for lot of data, it is still faster
> than list access.

Yes, this is a valid approach. Basically you use the symbol lookup
mechanism of 'intern'.

But I see several problems with that:

1. It clobbers the internal symbol name space.
2. There is the danger of conflict with existing symbols
3. The structure can only be cleaned up by calling 'zap' for each symbol
4. If you have many such structures, they will all use the same tree


There is a direct way to achieve something similar: The 'idx' mechanism,
using a binary tree. For example, creating a tree of 5 random
"array" entries, each initialized to zero:

   (off Tree)
   (do 5
  (idx 'Tree
 (def (format (rand 1 5)) 0)
 T ) )

Then, incrementing a million random entries:

   (bench
  (do 100
 (inc (car (idx 'Tree (format (rand 1 5) ) )
   2.650 sec


This doesn't clobber any global structures. Instead of (off Tree), you
would say

   (let Tree NIL
  ... )

and the whole tree will become garbage as soon as the 'let' body
terminates.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-22 Thread Kazimir Majorinc

Hi.

O(n) vs O(1) is large difference, however, there is rather simple way
out of the problem: instead of one vector L with indexes 1,...,9

L[1],...,L[9]

one can use symbols with names like

L1,...,L9

If Picolisp is fast enough with symbols, i.e. symbol access
time is O(log n), then

-
(bench (for (i 1 (<= i 9)(inc i)) (set (intern (pack 'L i)) i)))
0.292 sec -> 9

(bench (do 100 (inc 'L5)))
0.035 sec -> 105
-

For a comparison, access to list is much slower:

-
(bench (do 100 (inc (nth List 5
149.812 sec
-

Generation of the symbol names has its price,
however, for lot of data, it is still faster
than list access.

--
(bench (do 100 (inc (intern (pack 'L 10001)
2.620 sec -> 1010001

(bench (do 100 (inc (intern (pack 'L 8)
3.016 sec -> 108
--

The code is somehow ugly, but I guess that with few
helper functions, it can be prettier.

Furthermore, I think this approach is good Lisp style,
because I adapted symbols to array-like use; on similar
way, symbols can be adapted to other data structures,
without need for intervention of the implementer.

Am I right? Is it what Henrik wrote?





--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-22 Thread dexen deVries
On Tuesday 22 of February 2011 10:26:44 you wrote:
> On Tue, Feb 22, 2011 at 08:31:43AM +0100, Alexander Burger wrote:
> > So, again, thanks to you all for the support! I don't have to defend
> > that design decision in the future again! :)
>=20
> With new users, this will come up over and over again.


Perhaps a visual analogy would come in hand too?

``When switching windows on your desktop, do you use Alt+Tab, or do you=20
indixate explicitly on the exact window you want to pop up?''

where Alt+Tab denotes list trarversal, and indicating windows denotes index=
ed=20
array dereference.

=2D-=20
dexen deVries

[[[=E2=86=93][=E2=86=92]]]

> how does a C compiler get to be that big? what is all that code doing?

iterators, string objects, and a full set of C macros that ensure
boundary conditions and improve interfaces.

ron minnich, in response to Charles Forsyth

http://9fans.net/archive/2011/02/90
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-22 Thread Alexander Burger
On Tue, Feb 22, 2011 at 11:06:57AM +0100, Jon Kleiser wrote:
> >But now there is sort of a FAQ to point to.
> >
> >// Jakob
> >
> This is the FAQ: http://www.software-lab.de/doc/faq.html#arrays

Oh, indeed! I forgot about that one. Good to know ;-)

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-22 Thread Jon Kleiser

On 22-02-11 10:26 , Jakob Eriksson wrote:

On Tue, Feb 22, 2011 at 08:31:43AM +0100, Alexander Burger wrote:

So, again, thanks to you all for the support! I don't have to defend
that design decision in the future again! :)

With new users, this will come up over and over again.
But now there is sort of a FAQ to point to.

// Jakob


This is the FAQ: http://www.software-lab.de/doc/faq.html#arrays

;-)

/Jon
--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-22 Thread Jakob Eriksson
On Tue, Feb 22, 2011 at 08:31:43AM +0100, Alexander Burger wrote:
> 
> So, again, thanks to you all for the support! I don't have to defend
> that design decision in the future again! :)

With new users, this will come up over and over again.
But now there is sort of a FAQ to point to.

// Jakob

-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-21 Thread Alexander Burger
Thank you all for your comments!

I'm glad to learn that I was too pessimistic :)


On Mon, Feb 21, 2011 at 09:25:45PM +0100, Tomas Hlavaty wrote:
> well, abstinence often leads to frustration;-)

.. or to enlightenment ;-)


> You mention a few times "wasting tag bits".  Isn't the other and maybe
> more important point that arrays would not work with the way current
> garbage collection works?  Basically, the heap would become
> non-homogeneous and fragmented, breaking the beauty, simplicity and
> efficiency of current heap management.

Well, the garbage collector could easily be extended to know about
arrays. But, yes, fragmentation was the main reason for the single-cell
design (especially in the initial versions, when the gc used a
compacting algorithm).

I avoided most argumentation _pro_ lists to keep the article short, and
packed it into the fuzzy statement "... any Lisp programmer will know
them".


On Tue, Feb 22, 2011 at 10:43:55AM +0700, Henrik Sarvell wrote:
> I don't understand what the problem is, as far as array functionality
> goes one can use (get), and for hash functionality (assoc) or (idx),
> did I miss something?

Not at all. This was exactly the point. As long as we have 'get' and
other list functions, we can simulate the behavior of arrays.


So, again, thanks to you all for the support! I don't have to defend
that design decision in the future again! :)

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-21 Thread Henrik Sarvell
I don't understand what the problem is, as far as array functionality
goes one can use (get), and for hash functionality (assoc) or (idx),
did I miss something?

During development of http://vizreader.com which is quite a big
application, I never wished I had either native arrays or hashes.
However, some features would have been impossible to implement without
the index tree functionality, the below method is a good example:

(dm words> (L)
   (let Words NIL
  (for W L
 (and
(setq W (lowc (pack W)))
(not (common?> This W))
(if (idx 'Words W T)
   (inc (car @))
   (set W 1
  (idx 'Words)))

It will count all words in an article by using the word as the "key"
in hash terminology and the amount of times said word appears as the
"value". The information is later stored as a +Blob and used for NLP
as well as the full word search which I wrote about earlier here:
http://picolisp.com/5000/-2-I.html

I mean of course I could have used (assoc) in theory but not practically.



On Tue, Feb 22, 2011 at 4:22 AM, dexen deVries  wr=
ote:
> On Monday 21 of February 2011 21:54:17 you wrote:
>> Hi Dexen & Jos=3DE9,
>>
>> >> Now if you access the array in a semi-random order, so-called `cache
>> >> trashing' will ensue. Pretty much like reading random data from the
>> >> harddrive (for example from swap). The CPU, starved of data, will
>> >> idle uselessly.
>>
>> this is quite a low level detail and the effect on lists is the same.
>> The PicoLisp lists live on the heap which is an array of cells in the C
>> terms after all. =A0However, for lists this happens more often, dependin=
g
>> on how the list came into existence because the data will be quite
>> likely spread over many different memory pages.
>
> I'm not much experienced in details, but I'd guess `yes and no' ;)
>
> A `naive' implementation of compiler would emit code that *just* operates=
 on
> the data, and a`naive' CPU would *just* execute explicit instructions.
>
> However, modern CPUs have instructions for data pre-fetch & prefetching h=
ints
> (for explicit indication of prefetch) and a competent compiler can be exp=
ected
> to include such ones in proper places. Also, the CPU performs some analys=
is of
> all opcodes & is free to pre-fetch any data & code it considers likely to=
 be
> used in future -- up to and including speculative execution of future
> instructions.
>
> In the end, I'd guess walking a (linked) list can be more explicit indica=
tor
> for CPU to get the right data pre-fetched.
>
> But then again, that's just speculative fiction :)
>
> Greetz,
> --
> dexen deVries
>
> ``One can't proceed from the informal to the formal by formal means.''
> --
> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=3DUnsubscribe
>
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-21 Thread dexen deVries
On Monday 21 of February 2011 21:54:17 you wrote:
> Hi Dexen & Jos=E9,
> 
> >> Now if you access the array in a semi-random order, so-called `cache
> >> trashing' will ensue. Pretty much like reading random data from the
> >> harddrive (for example from swap). The CPU, starved of data, will
> >> idle uselessly.
> 
> this is quite a low level detail and the effect on lists is the same.
> The PicoLisp lists live on the heap which is an array of cells in the C
> terms after all.  However, for lists this happens more often, depending
> on how the list came into existence because the data will be quite
> likely spread over many different memory pages.

I'm not much experienced in details, but I'd guess `yes and no' ;)

A `naive' implementation of compiler would emit code that *just* operates on 
the data, and a`naive' CPU would *just* execute explicit instructions.

However, modern CPUs have instructions for data pre-fetch & prefetching hints 
(for explicit indication of prefetch) and a competent compiler can be expected 
to include such ones in proper places. Also, the CPU performs some analysis of 
all opcodes & is free to pre-fetch any data & code it considers likely to be 
used in future -- up to and including speculative execution of future 
instructions.

In the end, I'd guess walking a (linked) list can be more explicit indicator 
for CPU to get the right data pre-fetched.

But then again, that's just speculative fiction :)

Greetz,
-- 
dexen deVries

``One can't proceed from the informal to the formal by formal means.''
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-21 Thread Tomas Hlavaty
Hi Dexen & Jos=E9,

>> Now if you access the array in a semi-random order, so-called `cache
>> trashing' will ensue. Pretty much like reading random data from the
>> harddrive (for example from swap). The CPU, starved of data, will
>> idle uselessly.

this is quite a low level detail and the effect on lists is the same.
The PicoLisp lists live on the heap which is an array of cells in the C
terms after all.  However, for lists this happens more often, depending
on how the list came into existence because the data will be quite
likely spread over many different memory pages.

> seem to favor arrays is not efficiency, but that some algorithms
> deeply wired into our brains are more clearly represented using arrays
> instead of single linked lists.

Agree, completely.

Cheers,

Tomas
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-21 Thread José Ignacio
On Mon, Feb 21, 2011 at 16:49, dexen deVries  wrot=
e:
> Hi list,
>
> On Monday 21 of February 2011 19:58:32 you wrote:
>> =A0 =A0Articles & Essays =A0-> =A0Array Abstinence
>>
>> or at the direct link "http://picolisp.com/5000/-2-1d.html";.
>
>
> The article lays it out all clearly and neatly, with one caveat I'd like =
to
> mention.
>
> 
> Alex writes, ``Given a number, the associated data can be found [in array=
] in
> O(1) time.''
>
> That holds true if, and only if, the relevant part of the array is alread=
y
> present in L1 cache. Otherwise, about 200...300 CPU cycles [1] will be wa=
sted
> waiting for the cache to be filled before data is available to the CPU. T=
his
> wasted time actually dominates the execution time, in pathological cases.
>
> The data will be in cache most of the time for small arrays, but will NOT=
 be
> in cache for larger arrays. So if you think ``for large ammouts of data a=
rrays
> will be significantly faster'', don't -- there's significant overhead rel=
ated to
> cache loads. To the contrary, traversing PicoLisp's linked list will be q=
uite
> fast.
>
> Now if you access the array in a semi-random order, so-called `cache tras=
hing'
> will ensue. Pretty much like reading random data from the harddrive (for
> example from swap). The CPU, starved of data, will idle uselessly.
>
> On the other hand, if you access the array it sequentially, it's not bett=
er
> than simply traversing the PicoLisp linked list. You can even reasonably
> expect the CPU to pre-fetch relevant data into the cache.
>
> If you really, positively, want to index a huge number of data, use a bin=
ary
> search tree with complexity of O(log n) on average. Building BSTs from
> PicoLisp lists is easy :)
> 
>
>
> To wrap it up: surprisingly, due to modern memory/CPU speed disparity, ra=
ndom
> access to data in an array isn't all that fast. Theoretically O(1), but i=
n
> practice it will incur high overhead of cache misses.
> Traversing a list is fast enough unless you have huge datasets. In that c=
ase,
> use a binary search tree.
>
>
> [1] http://lwn.net/Articles/252125/
> (& some other publication by some Sun guy)
>
:D that should be in the essay! anyway, one of the reasons people (me
included) seem to favor arrays is not efficiency, but that some
algorithms deeply wired into our brains are more clearly represented
using arrays instead of single linked lists. But that's just matter of
being used to a way of thinking, i guess we just need to learn how to
write clear code using lists instead of arrays, perhaps favoring other
algorithms, Alex's work on RosettaCode helps a lot with that :], I
surprised myself when i could code that BF interpreter using only
lists with fairly concise and clear code.

> --
> dexen deVries

Cheers,
Jos=E9
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-21 Thread Tomas Hlavaty
Hi Alex,

>Articles & Essays  ->  Array Abstinence

well, abstinence often leads to frustration;-)

> So I felt I should try to explain my view more clearly, and wrote a
> little article in the wiki. It is a bit too short, perhaps, and
> perhaps even increases the confusion :(

Nice article.

> Regarding all that, it should become clear that wasting tag bits and
> efforts for such a data type and its associated functions is not a
> good idea.

You mention a few times "wasting tag bits".  Isn't the other and maybe
more important point that arrays would not work with the way current
garbage collection works?  Basically, the heap would become
non-homogeneous and fragmented, breaking the beauty, simplicity and
efficiency of current heap management.

> If there is a choice between arrays or lists, lists will clearly win.

That's rather subjective and vague assertion in the middle of the
article, maybe it should be at the end as a conclusion;-)

Cheers,

Tomas
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-21 Thread dexen deVries
Hi list,

On Monday 21 of February 2011 19:58:32 you wrote:
> in recent IRC discussions, the old requests (demands) for array support
> in PicoLisp rose their ugly heads again.
> 
> I'm a bit disappointed about that, as it (or, better, its absence) is a
> central issue, and it shows that I couldn't make clear at all what
> PicoLisp is about.
> 
> So I felt I should try to explain my view more clearly, and wrote a
> little article in the wiki. It is a bit too short, perhaps, and perhaps
> even increases the confusion :(
> 
> But nevertheless, I posted it. It might be improved in the future, and
> can be found under
> 
>Articles & Essays  ->  Array Abstinence
> 
> or at the direct link "http://picolisp.com/5000/-2-1d.html";.


The article lays it out all clearly and neatly, with one caveat I'd like to 
mention.


Alex writes, ``Given a number, the associated data can be found [in array] in 
O(1) time.'' 

That holds true if, and only if, the relevant part of the array is already 
present in L1 cache. Otherwise, about 200...300 CPU cycles [1] will be wasted 
waiting for the cache to be filled before data is available to the CPU. This 
wasted time actually dominates the execution time, in pathological cases.

The data will be in cache most of the time for small arrays, but will NOT be 
in cache for larger arrays. So if you think ``for large ammouts of data arrays 
will be significantly faster'', don't -- there's significant overhead related 
to 
cache loads. To the contrary, traversing PicoLisp's linked list will be quite 
fast.

Now if you access the array in a semi-random order, so-called `cache trashing' 
will ensue. Pretty much like reading random data from the harddrive (for 
example from swap). The CPU, starved of data, will idle uselessly.

On the other hand, if you access the array it sequentially, it's not better 
than simply traversing the PicoLisp linked list. You can even reasonably 
expect the CPU to pre-fetch relevant data into the cache.

If you really, positively, want to index a huge number of data, use a binary 
search tree with complexity of O(log n) on average. Building BSTs from 
PicoLisp lists is easy :)



To wrap it up: surprisingly, due to modern memory/CPU speed disparity, random 
access to data in an array isn't all that fast. Theoretically O(1), but in 
practice it will incur high overhead of cache misses.
Traversing a list is fast enough unless you have huge datasets. In that case, 
use a binary search tree.


[1] http://lwn.net/Articles/252125/
(& some other publication by some Sun guy)

-- 
dexen deVries


``One can't proceed from the informal to the formal by formal means.''
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: PicoLisp and Arrays

2011-02-21 Thread Ana Zgombic
On Tue, Feb 22, 2011 at 2:58 AM, Alexander Burger  wro=
te:
> Hi all,
>
> in recent IRC discussions, the old requests (demands) for array support
> in PicoLisp rose their ugly heads again.
>
> I'm a bit disappointed about that, as it (or, better, its absence) is a
> central issue, and it shows that I couldn't make clear at all what
> PicoLisp is about.
>
> So I felt I should try to explain my view more clearly, and wrote a
> little article in the wiki. It is a bit too short, perhaps, and perhaps
> even increases the confusion :(
>
> But nevertheless, I posted it. It might be improved in the future, and
> can be found under
>
> =C2=A0 Articles & Essays =C2=A0-> =C2=A0Array Abstinence
>
> or at the direct link "http://picolisp.com/5000/-2-1d.html";.
>

when i wear my C hat, i think arrays. when i wear my lisp hat (not a
very big hat at the moment), i think lists.

making it easier for myself, lists *are* arrays. in lists however,
order is inherent. usage is the same, level of abstraction is
different. must be the math training. :)

my 2 cents.

Ana
--=20
http://nybl.info
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe