subject:"Question about data structures"

Re: Question about data structures

2020-11-27 Thread Zelphir Kaltstahl

Hi all!

Thank you for your plentiful replies, I did not forget you. I simply did
not get around to to answering everyone. I got something out of it. I
still need to apply it in my program and will probably discover more
things to it, like always.

Best wishes,
Zelphir

-- 
repositories: https://notabug.org/ZelphirKaltstahl

Re: [EXT] Re: Question about data structures

2020-11-23 Thread Thompson, David

On Sun, Nov 22, 2020 at 10:43 PM Taylan Kammer  wrote:
>
> Python lists, JDK's ArrayList, and .NET ArrayList, among probably many
> other "list" or "array" data structures in popular languages nowadays
> use a relatively straightforward data structure that is backed by an
> actual array which can have empty slots (e.g. your Python list with 3
> elements might be backed by an array of size 10), and is reallocated
> whenever there's no space left.  This means that appending an element at
> the end is usually dirt cheap, until there's no space left, at which
> point the append operation is much heavier for one call, then the
> following calls are dirt cheap again, until it's full again...
>
> Inserting an element at the beginning or middle is also relatively
> expensive with those implementations, since all elements need to be
> shifted forward to make space for the new element.  (Although this might
> be done with an operation like C's memcpy which is still actually very
> fast.)
>
> It's called a "dynamic array" by Wikipedia:
>
>  https://en.wikipedia.org/wiki/Dynamic_array
>
> If you want to go on an adventure, you could implement a Scheme data
> structure called DVector that implements this strategy, using plain
> Scheme vectors for the backing array.

If anyone is interested in such a data structure, I have an
implementation available here:

https://git.dthompson.us/chickadee.git/tree/chickadee/array-list.scm

I typically use them as object pools to reduce allocation and thus GC
pressure in the context of video games.  The array-list-push! and
array-list-pop! procedures make it easy to use as a dynamically
expanding stack.

- Dave

Re: Question about data structures

2020-11-23 Thread Dr. Arne Babenhauserheide


Zelphir Kaltstahl  writes:

> How do you approach this problem? Is it a problem at all?

I would use a deque: a double ended queue. Cheap push and pop at end and
beginning.

https://srfi.schemers.org/srfi-134/srfi-134.html

I created a guile-implementation for that:
https://github.com/scheme-requests-for-implementation/srfi-134/tree/master/contrib/arne-babenhauserheide

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken


signature.asc
Description: PGP signature

Re: Question about data structures

2020-11-23 Thread Neil Jerram

On Sun, 22 Nov 2020 at 18:49, Zelphir Kaltstahl 
wrote:

> Hello Guile Users!
>
> I have a question about data structures.
>
> Recently I read a file and the lines in the file would become a list in
> my Guile program. The file was not super big or anything. However, I
> usually try to avoid having to use `append` or `reverse`, whenever
> possible, considering, that they are O(n) operations and in principle I
> do not want to write code, that uses lists in inefficient ways, when
> there is a more efficient way. That said, I hit a little snag:
>
> When I am reading a file and do not know how many lines there are in the
> file, I can use a normal list and construct it using recursive calls
> like `(cons line (iter ...))` where `iter` is the recursive call and.
> Could also be called `process-next-line` or simply `next`. Since I am
> building a recursive data structure, it is OK to have a non-tail
> position recursive call. Then I would return that list and work with it.
> However, what if I ever need to add a list entry and the order of list
> entry matters? I would either have to use `append`, to add it to the
> end, which would be O(n), or I would have to initially construct the
> list of lines in reversed order from initially, so that I can add by
> simply using `(cons new-entry lines)`. However, when I use the list in
> reverse and ever need to output the lines in the list in their original
> order, I would first need to `reverse` the list again.
>
> OK the whole reversing would not be a problem, if I used a vector to
> store the lines. However, then I would need to know the number of lines
> in the file ahead of time or look at the file once for counting the
> lines, then create the vector of that length and then store the lines in
> it. This seems inelegant again, because I look at the lines of the file
> twice. I could read it in as a list and then use `list->vector`, but
> that will also be an additional O(n) for converting every entry to
> vector element.
>
> If I now think about Python, then I have Python lists (actually arrays?)
> and those can be easily appended to in O(1) and random access is also
> O(1) according to https://wiki.python.org/moin/TimeComplexity. This
> means I do not need to think much about how I will use the lines of a
> file, when reading in the file. A Python list will be appropriate.
>
> So in Guile I sometimes feel some mental overhead of thinking about the
> choice of data structure, which I do not feel in Python. Generally I
> like Guile a lot more, so I would like to know how others deal with
> this. Here are some ideas for it:
>
> 1. I could create some abstraction layer for the "sequence of read
> lines", which I use inside the procedure, which reads in the lines and
> all other code, that works with the lines, so that I can rather easily
> later exchange the data structure. A data abstraction. However, that
> might only hide the complexities of some operations behind the
> abstraction and not reduce them.
>
> 2. Perhaps I want Guile's arrays? (Can they be expanded in O(1)? I do
> seem to remember reading, that Guile vectors are only a special case of
> Guile arrays, so that would mean they are not expandable in O(1).)
>
> 3. Just convert list to vector after reading in the file until I hit a
> problem, where that additional O(n) really becomes a problem. (In my
> current project it is not realistically a problem.) But this does not
> satisfy me. I should learn how to solve the problem in general and use
> the best way to do things.
>
> 4. Perhaps I need to write another data structure, that creates a new
> vector, when the previous one is full, managing the expansion myself.
>
> How do you approach this problem? Is it a problem at all?


My standard pattern:

 (let loop ((input input) (output '()))
  (if (null? input)
  (reverse output)
  (loop (cdr input) (cons (process (car input)) output

Re: Question about data structures

2020-11-22 Thread John Cowan

On Sun, Nov 22, 2020 at 10:43 PM Taylan Kammer 
wrote:

Since your main concern seems to be appending, you could simply use a
> linked list where you keep a reference to the last cons pair (tail) of
> the list, so appending is simply a matter of a 'set-cdr!' operation on
> the tail.
>

SRFI 117, List Queues, does exactly that.

> Python lists, JDK's ArrayList, and .NET ArrayList, among probably many
> other "list" or "array" data structures in popular languages nowadays
> use a relatively straightforward data structure that is backed by an
> actual array which can have empty slots (e.g. your Python list with 3
> elements might be backed by an array of size 10), and is reallocated
> whenever there's no space left.  This means that appending an element at
> the end is usually dirt cheap, until there's no space left, at which
> point the append operation is much heavier for one call, then the
> following calls are dirt cheap again, until it's full again...
>

And the recent SRFI 214, Flexvectors, provides exactly this.

Packaging these two SRFIs for Guile would be a Good Thing.

John Cowan  http://vrici.lojban.org/~cowanco...@ccil.org
The present impossibility of giving a scientific explanation is no proof
that there is no scientific explanation. The unexplained is not to be
identified with the unexplainable, and the strange and extraordinary
nature of a fact is not a justification for attributing it to powers
above nature.  --The Catholic Encyclopedia, s.v. "telepathy" (1913)

Re: Question about data structures

2020-11-22 Thread Taylan Kammer


On 22.11.2020 19:48, Zelphir Kaltstahl wrote:

Hello Guile Users!

I have a question about data structures.

[...]

How do you approach this problem? Is it a problem at all?


First of all, be cautious about premature optimization.  In many cases 
it's best to just write the code the most straightforward way possible 
with the tools at hand, and not bother with optimization unless it 
actually proves to be an issue.  Are you going to be processing files 
with millions of lines?  Thousands of lines but on a very weak CPU? 
Does it matter if your program takes 0.1 seconds or 2 seconds to run?



Now the actual answer, in case you need to optimize, or just want to 
learn more:



All data structures that offer a sequential list of elements have to 
make some trade-offs between the performance of various operations, as 
well as the implementation complexity.  Linked lists (i.e. "lists" in 
Scheme) are very simple, and a few operations are cheap as well, but 
they have the shortcomings you've described plus some more.



Since your main concern seems to be appending, you could simply use a 
linked list where you keep a reference to the last cons pair (tail) of 
the list, so appending is simply a matter of a 'set-cdr!' operation on 
the tail.



Python lists, JDK's ArrayList, and .NET ArrayList, among probably many 
other "list" or "array" data structures in popular languages nowadays 
use a relatively straightforward data structure that is backed by an 
actual array which can have empty slots (e.g. your Python list with 3 
elements might be backed by an array of size 10), and is reallocated 
whenever there's no space left.  This means that appending an element at 
the end is usually dirt cheap, until there's no space left, at which 
point the append operation is much heavier for one call, then the 
following calls are dirt cheap again, until it's full again...


Inserting an element at the beginning or middle is also relatively 
expensive with those implementations, since all elements need to be 
shifted forward to make space for the new element.  (Although this might 
be done with an operation like C's memcpy which is still actually very 
fast.)


It's called a "dynamic array" by Wikipedia:

https://en.wikipedia.org/wiki/Dynamic_array

If you want to go on an adventure, you could implement a Scheme data 
structure called DVector that implements this strategy, using plain 
Scheme vectors for the backing array.



The VList has also been mentioned in this thread, but from what I can 
tell it doesn't seem to offer a very efficient append operation.



- Taylan

Re: Question about data structures

2020-11-22 Thread Tim Van den Langenbergh

On Sunday, 22 November 2020 19:48:24 CET Zelphir Kaltstahl wrote:
> Hello Guile Users!
>
> I have a question about data structures.
>
> Recently I read a file and the lines in the file would become a list in
> my Guile program. The file was not super big or anything. However, I
> usually try to avoid having to use `append` or `reverse`, whenever
> possible, considering, that they are O(n) operations and in principle I
> do not want to write code, that uses lists in inefficient ways, when
> there is a more efficient way. That said, I hit a little snag:
>
> When I am reading a file and do not know how many lines there are in the
> file, I can use a normal list and construct it using recursive calls
> like `(cons line (iter ...))` where `iter` is the recursive call and.
> Could also be called `process-next-line` or simply `next`. Since I am
> building a recursive data structure, it is OK to have a non-tail
> position recursive call. Then I would return that list and work with it.
> However, what if I ever need to add a list entry and the order of list
> entry matters? I would either have to use `append`, to add it to the
> end, which would be O(n), or I would have to initially construct the
> list of lines in reversed order from initially, so that I can add by
> simply using `(cons new-entry lines)`. However, when I use the list in
> reverse and ever need to output the lines in the list in their original
> order, I would first need to `reverse` the list again.
>
> OK the whole reversing would not be a problem, if I used a vector to
> store the lines. However, then I would need to know the number of lines
> in the file ahead of time or look at the file once for counting the
> lines, then create the vector of that length and then store the lines in
> it. This seems inelegant again, because I look at the lines of the file
> twice. I could read it in as a list and then use `list->vector`, but
> that will also be an additional O(n) for converting every entry to
> vector element.
>
> If I now think about Python, then I have Python lists (actually arrays?)
> and those can be easily appended to in O(1) and random access is also
> O(1) according to https://wiki.python.org/moin/TimeComplexity. This
> means I do not need to think much about how I will use the lines of a
> file, when reading in the file. A Python list will be appropriate.
>
> So in Guile I sometimes feel some mental overhead of thinking about the
> choice of data structure, which I do not feel in Python. Generally I
> like Guile a lot more, so I would like to know how others deal with
> this. Here are some ideas for it:
>
> 1. I could create some abstraction layer for the "sequence of read
> lines", which I use inside the procedure, which reads in the lines and
> all other code, that works with the lines, so that I can rather easily
> later exchange the data structure. A data abstraction. However, that
> might only hide the complexities of some operations behind the
> abstraction and not reduce them.
>
> 2. Perhaps I want Guile's arrays? (Can they be expanded in O(1)? I do
> seem to remember reading, that Guile vectors are only a special case of
> Guile arrays, so that would mean they are not expandable in O(1).)
>
> 3. Just convert list to vector after reading in the file until I hit a
> problem, where that additional O(n) really becomes a problem. (In my
> current project it is not realistically a problem.) But this does not
> satisfy me. I should learn how to solve the problem in general and use
> the best way to do things.
>
> 4. Perhaps I need to write another data structure, that creates a new
> vector, when the previous one is full, managing the expansion myself.
>
> How do you approach this problem? Is it a problem at all?
>
> Best regards,
> Zelphir
>
>

Hey Zelphir,

If you want to use a FIFO data structure, you may want to check out queues.

They're already in ice-9, under (ice-9 q).

Mandatory manual reference:
https://www.gnu.org/software/guile/manual/html_node/Queues.html#Queues

Alternatively, if you want constant-time random access you could try using
Vlists, although they aren't thread-safe.

https://www.gnu.org/software/guile/manual/html_node/VLists.html#VLists

Finally you could implement either a doubly-linked list or array list type
depending on your needs.

If I understand your requirements correctly I would recommend queues. They are
easy to work with.

Sincerely yours,

- Tim


signature.asc
Description: This is a digitally signed message part.

Re: Question about data structures

2020-11-22 Thread divoplade

Le dimanche 22 novembre 2020 à 21:24 +0100, Zelphir Kaltstahl a écrit :
> If I had a vector, I could simply go by index backwards or
> forwards without adding any runtime complexity.

So, you would like to sometimes go forward, sometimes go backward? If
it is sequential, the list is what you want. With the 2-variable
function used earlier, you can go in one direction or the other,
depending on which argument you decompose.

> a more specific
> but less generally useful question is: "What do I use in Guile, if I
> were using a dynamic array in Python?"

It depends. If you were using a dynamic array in Python because there
was no good implementation of what guile calls lists or vectors, use
lists or vectors ;-)

If you were using a dynamic array because you wanted to grow lists
while keeping random indexing, use VLists (
https://www.gnu.org/software/guile/manual/guile.html#VLists). Or hash
tables (https://www.gnu.org/software/guile/manual/guile.html#VHashes, 
https://www.gnu.org/software/guile/manual/guile.html#Hash-Tables).

Re: Question about data structures

2020-11-22 Thread kwright

divoplade  writes:

> Hello Zelphir!
>
> Le dimanche 22 novembre 2020 à 19:48 +0100, Zelphir Kaltstahl a écrit :
>> However, when I use the list in  reverse and ever need
>> to output the lines in the list in their original
>> order, I would first need to `reverse` the list again.
> There is a "reverse" function; you could implement it yourself as a
> tail-recursive function if you wanted (it's currently implemented in C,
> so my guess is it's even more efficient). You don't need vectors for
> that.
>
> (define (my-reverse-aux accumulation list)
>   (if (null? list)
>   accumulation
>   (my-reverse-aux (cons (car list) accumulation) (cdr list
>
> (define (my-reverse list)
>   (my-reverse-aux '() list))
>
> (my-reverse '(a b c d e f g h))

There is also a reverse! procedure.
  (reverse! '(a b c)) 
  $1 = (c b a)

I have not actually looked at the code, but I assume from the "!"
in the name and a decent respect for competence of the programmers,
that it uses the well-known algorithm to reverse a list by destructive
update.

This algorithm is still O(n), but who cares?  It is at least O(n)
to read n lines, no matter how how you read them.  The cost is
amortised and is only O(1) per line.  The thing to avoid is "cons".
The my-reverse procedure above still does O(n) times cons, which uses
storage, and may call garbage collection.

The destructive update algorithm does O(n) times update one storage
location and a couple of registers, which is trivial compared to
reading a line.

Just remember not to save pointers into the original non-reversed list,
because it gets smashed.  (You can still save pointers to the _members_
of the list.)

 -- Keith,

Programmer in Chief, Free Computer Shop,
http://www.free-comp-shop.com/
Food, Shelter, Source code

Re: Question about data structures

2020-11-22 Thread Zelphir Kaltstahl

Hi divoplade!

I know there is reverse and I think I did implement it before, when
working through SICP exercises. Thanks for that implementation and input
though!

I think the point I wanted to make is rather to avoid `reverse`
completely. If I had a vector, I could simply go by index backwards or
forwards without adding any runtime complexity. But a vector is of
defined length, not expandable like a list via cons, which means it
might not be the best idea under some circumstances. Other circumstances
make lists less ideal. If those circumstances appear together, then I
probably should be using another data structure. Or pay the price in
time complexity for some operations.

I just read, that Python "lists" are actually implemented as
https://en.wikipedia.org/wiki/Dynamic_array. So I guess a more specific
but less generally useful question is: "What do I use in Guile, if I
were using a dynamic array in Python?"

I almost never find myself reversing a Python "list". Probably because
it can be indexed in reverse order indices.

Best regards,
Zelphir

On 11/22/20 8:45 PM, divoplade wrote:
> Hello Zelphir!
>
> Le dimanche 22 novembre 2020 à 19:48 +0100, Zelphir Kaltstahl a écrit :
>> However, when I use the list in
>> reverse and ever need to output the lines in the list in their
>> original
>> order, I would first need to `reverse` the list again.
> There is a "reverse" function; you could implement it yourself as a
> tail-recursive function if you wanted (it's currently implemented in C,
> so my guess is it's even more efficient). You don't need vectors for
> that.
>
> (define (my-reverse-aux accumulation list)
>   (if (null? list)
>   accumulation
>   (my-reverse-aux (cons (car list) accumulation) (cdr list
>
> (define (my-reverse list)
>   (my-reverse-aux '() list))
>
> (my-reverse '(a b c d e f g h))
>
-- 
repositories: https://notabug.org/ZelphirKaltstahl

Re: Question about data structures

2020-11-22 Thread divoplade

Hello Zelphir!

Le dimanche 22 novembre 2020 à 19:48 +0100, Zelphir Kaltstahl a écrit :
> However, when I use the list in
> reverse and ever need to output the lines in the list in their
> original
> order, I would first need to `reverse` the list again.
There is a "reverse" function; you could implement it yourself as a
tail-recursive function if you wanted (it's currently implemented in C,
so my guess is it's even more efficient). You don't need vectors for
that.

(define (my-reverse-aux accumulation list)
  (if (null? list)
  accumulation
  (my-reverse-aux (cons (car list) accumulation) (cdr list

(define (my-reverse list)
  (my-reverse-aux '() list))

(my-reverse '(a b c d e f g h))

Question about data structures

2020-11-22 Thread Zelphir Kaltstahl

Hello Guile Users!

I have a question about data structures.

Recently I read a file and the lines in the file would become a list in
my Guile program. The file was not super big or anything. However, I
usually try to avoid having to use `append` or `reverse`, whenever
possible, considering, that they are O(n) operations and in principle I
do not want to write code, that uses lists in inefficient ways, when
there is a more efficient way. That said, I hit a little snag:

When I am reading a file and do not know how many lines there are in the
file, I can use a normal list and construct it using recursive calls
like `(cons line (iter ...))` where `iter` is the recursive call and.
Could also be called `process-next-line` or simply `next`. Since I am
building a recursive data structure, it is OK to have a non-tail
position recursive call. Then I would return that list and work with it.
However, what if I ever need to add a list entry and the order of list
entry matters? I would either have to use `append`, to add it to the
end, which would be O(n), or I would have to initially construct the
list of lines in reversed order from initially, so that I can add by
simply using `(cons new-entry lines)`. However, when I use the list in
reverse and ever need to output the lines in the list in their original
order, I would first need to `reverse` the list again.

OK the whole reversing would not be a problem, if I used a vector to
store the lines. However, then I would need to know the number of lines
in the file ahead of time or look at the file once for counting the
lines, then create the vector of that length and then store the lines in
it. This seems inelegant again, because I look at the lines of the file
twice. I could read it in as a list and then use `list->vector`, but
that will also be an additional O(n) for converting every entry to
vector element.

If I now think about Python, then I have Python lists (actually arrays?)
and those can be easily appended to in O(1) and random access is also
O(1) according to https://wiki.python.org/moin/TimeComplexity. This
means I do not need to think much about how I will use the lines of a
file, when reading in the file. A Python list will be appropriate.

So in Guile I sometimes feel some mental overhead of thinking about the
choice of data structure, which I do not feel in Python. Generally I
like Guile a lot more, so I would like to know how others deal with
this. Here are some ideas for it:

1. I could create some abstraction layer for the "sequence of read
lines", which I use inside the procedure, which reads in the lines and
all other code, that works with the lines, so that I can rather easily
later exchange the data structure. A data abstraction. However, that
might only hide the complexities of some operations behind the
abstraction and not reduce them.

2. Perhaps I want Guile's arrays? (Can they be expanded in O(1)? I do
seem to remember reading, that Guile vectors are only a special case of
Guile arrays, so that would mean they are not expandable in O(1).)

3. Just convert list to vector after reading in the file until I hit a
problem, where that additional O(n) really becomes a problem. (In my
current project it is not realistically a problem.) But this does not
satisfy me. I should learn how to solve the problem in general and use
the best way to do things.

4. Perhaps I need to write another data structure, that creates a new
vector, when the previous one is full, managing the expansion myself.

How do you approach this problem? Is it a problem at all?

Best regards,
Zelphir

-- 
repositories: https://notabug.org/ZelphirKaltstahl

Re: Question about data structures

Re: [EXT] Re: Question about data structures

Re: Question about data structures

Re: Question about data structures

Re: Question about data structures

Re: Question about data structures

Re: Question about data structures

Re: Question about data structures

Re: Question about data structures

Re: Question about data structures

Re: Question about data structures

Question about data structures

12 matches

Site Navigation

Mail list logo

Footer information