Re: avltree / genalloc

2018-04-13 Thread Olivier Brunel

>   Then genalloc is indeed the right data structure from you, but
> I don't have helpers to arbitrarily remove an object in the middle.
> Note that if you don't need index stability, it's trivial, and O(1):
> you can just overwrite the ith object with the (n-1)th, and decrement
> the length. No need to copy len-i-1 objects, just copy one.

Ah, yes. But when I said I don't need index stability, I meant current
4 can become 2, but I actually want to preserve the order of elements
within the array... hence moving the whole data block.
(I guess for a generic genalloc_delete_size an extra flag indicating
whether to move all elements or just move the last one might be good.)

Cheers,


Re: avltree / genalloc

2018-04-12 Thread Laurent Bercot



Ok, so let me ask you this: I want to delete my avltree. That is, I'd
like to remove all its nodes and be done with it, i.e. I need to free
all memory associated with it. I guess avltree_free() is enough, no
need to delete each node individually?


 Yes, avltree_free() is enough, because an avltree doesn't hold any
pointers to your data.



Obviously I want to do the same with my data, stored in a gensetdyn,
but there gensetdyn_free isn't enough, because I have extra memory
allocated (on a per-item basis) that I need to free as well, hence I
need to iterate over the items to perform my own freeing.


 Yes. Note that for that you can use gensetdyn_iter, the "f"
argument being a function that frees a cell. It will be called
on every allocated cell. The cells will still be marked as
allocated (it's not gensetdyn_delete()) but you don't care since
you're going to call gensetdyn_free() right afterwards.

 I have such an iterator in genalloc: genalloc_deepfree(), which
calls the user-provided function to free every object. I don't
know why I haven't added the same for genset/dyn, I should
probably do it.



So yeah, that should in fact be done over the gensetdyn not the
avltree, that was my mistake, and then I simply not delete anything
either, just free my memory and call gensetdyn_free is enough, correct?


 Yes. Free your data, then free the container.



 (Just to know, would the same be true with gensetdyn, and I shouldn't
add/remove during a gensetdyn_iter?)


 Yes, and it's even trickier with a gensetdyn than with an avltree,
because the way a gensetdyn keeps track of what cells are allocated
is via a "freelist" stack, i.e. a genalloc of uint32_t containing the
indices of free cells. This makes gensetdyn_new() and gensetdyn_delete()
O(1) since they're just popping and pushing the stack. But for
gensetdyn_iter, the freelist is first converted to a bitarray, and
the bitarray is scanned, so the iterator is only called on allocated
cells. If you change the freelist while gensetdyn_iter() is running,
you will desync the freelist from the bitarray, and the iterator may
miss an allocated cell, or worse, call your function on an unallocated
one. So, don't do that. ^^'



Just to be clear, avltree_delete() actually takes the key (void *) for
that data, not the uint32_t itself, right?


 Er, yes, sorry, it takes the key. So if you have the data, you need to
call your dtok function first. ("data to key")



hmm... I've been thinking about this, but I feel like what I need is
just a (dynamic) array, that I can add to and remove from.


 Then genalloc is indeed the right data structure from you, but
I don't have helpers to arbitrarily remove an object in the middle.
Note that if you don't need index stability, it's trivial, and O(1):
you can just overwrite the ith object with the (n-1)th, and decrement
the length. No need to copy len-i-1 objects, just copy one.

--
 Laurent



Re: avltree / genalloc

2018-04-12 Thread Olivier Brunel


Many thanks for the explaination/clarifications!


>   avltree_iter isn't meant to perform operations that modify the 
> structure
> of the tree. It *may* work, but there's no guarantee it will - and
> typically, if you modify the tree, later values for node height will
> certainly be incorrect. I wrote avltree_iter as a shortcut to perform
> simple operations such as printing the tree or calling a function on
> the data contained in every node; it should not be used with a
> tree-modifying function.

Ok, so let me ask you this: I want to delete my avltree. That is, I'd
like to remove all its nodes and be done with it, i.e. I need to free
all memory associated with it. I guess avltree_free() is enough, no
need to delete each node individually?

Obviously I want to do the same with my data, stored in a gensetdyn,
but there gensetdyn_free isn't enough, because I have extra memory
allocated (on a per-item basis) that I need to free as well, hence I
need to iterate over the items to perform my own freeing.

So yeah, that should in fact be done over the gensetdyn not the
avltree, that was my mistake, and then I simply not delete anything
either, just free my memory and call gensetdyn_free is enough, correct?

(Just to know, would the same be true with gensetdyn, and I shouldn't
add/remove during a gensetdyn_iter?)


> future major release - and use avltree_delete(), which is the right
> API. And the argument to avltree_delete() is the data in the node,
> i.e. your uint32_t (that is probably the index of your object in your
> gensetdyn).

Just to be clear, avltree_delete() actually takes the key (void *) for
that data, not the uint32_t itself, right?


>   Well, genalloc is really "stralloc for larger objects", and I try
> not to implement for genalloc what I would not implement for stralloc.
> If you often need to remove items from a genalloc that are not the
> last one, then genalloc may not be the appropriate data structure,
> and you may want a gensetdyn instead.

hmm... I've been thinking about this, but I feel like what I need is
just a (dynamic) array, that I can add to and remove from. Maybe I'm
not seeing a genalloc exactly same as you; I feel gensetdyn might be
good for that, but I don't actually need indices to be stable (I'm
storing pointers), which is why a genalloc seems right to me: it's an
array, that can grow (or shrink) as needed. No need for more complexity
or anything. genalloc is pretty simple & straightforward, it's good,
but I do need to remove items from time to time, and that means I need
the memory moving bit.


Cheers,


Re: avltree / genalloc

2018-04-12 Thread Laurent Bercot

- When using avltree_iter() we need to pass a avliterfunc_t_ref, which
 takes 3 arguments. If I understand correctly, first is the uint32_t
 index, then the tree's "internal" (unsigned int) index, and my
 pointer.


 "The uint32_t index" is confusing terminology, so to clarify, I call
this number "the data contained in the node".
 An avlnode only stores an uint32_t of data, it doesn't know anything
else; so that uint32_t is "the data". If you want to use the tree
to index anything else than a set of uint32_t, you need to store your
objects somewhere, for instance in a gensetdyn, and use the tree to
store the indices of your objects into the gensetdyn. That is typically
what the "external" field of the avltree is for: to store a pointer
to your storage structure, so you can access the objects to extract
keys from them (in dtok) and compare keys (in kcmp).

 But as far as avlnode/avltree is concerned, the uint32_t is "the data".
 So, the first argument is the data. The second argument is not the
internal index, which user functions should basically never see because
it's an implementation detail and totally irrelevant to them; the second
argument is the height of the current node (root is 0, direct child of
the root is 1, etc.) The third argument is the external pointer.



 So if I wanted, from there, to remove an item from the avltree, I
 could either use avltree_deletenode(tree, ARG2) or
 avltree_delete(tree, KEY) where KEY is the one from dtok(ARG1,p) --
 would all that be correct, or did I miss/misunderstood something?


 avltree_iter isn't meant to perform operations that modify the 
structure

of the tree. It *may* work, but there's no guarantee it will - and
typically, if you modify the tree, later values for node height will
certainly be incorrect. I wrote avltree_iter as a shortcut to perform
simple operations such as printing the tree or calling a function on
the data contained in every node; it should not be used with a
tree-modifying function.

 If you need to iterate over a tree-modifying function, I think you
should write the loop yourself, because the complexity of writing the
loop is much lesser than the complexity required to handle tree
structure changes in the middle of a map.



- So it means avltree_deletenode() needs the avltree's own/internal
 index, whereas gensetdyn_delete() needs "our" uint32_t index,
 correct?


 gensetdyn is totally irrelevant to the tree. (I mean, avltree uses
a gensetdyn internally, to store the avlnodes, but it is an
implementation detail you should not care about.)
 If you are using a gensetdyn to store your objects, then you will
want to use the gensetdyn's indices as data in the tree.

 avltree_deletenode() is a bad API: it uses the internal index to
the node, but the user should basically have no way to interact with
that internal index. I wrote that "just in case", initially thinking
it should, logically, be faster than avltree_delete() since it has
internal information, but it's 1. a YAGNI violation, and 2. wrong 
anyway, because to delete a node you need to know the path from the root 
to that

node, and just knowing the internal index doesn't give you that - you
need to go on foot and make your key comparisons from the root down,
which is exactly what avltree_delete() does.

 So: forget about avltree_deletenode() - thanks for the fix, I will 
apply

the patch, but I will probably remove that macro altogether in a future
major release - and use avltree_delete(), which is the right API. And
the argument to avltree_delete() is the data in the node, i.e. your
uint32_t (that is probably the index of your object in your gensetdyn).

 My bad for creating confusion with a function that should not exist.



 I.e. if I have that uint32_t and want to remove things from both the
 avltree & gensetdyn (I'm using the later to store the actual data
 indexed via the former, btw) then for avltree I need to get the key
 (e.g. via t->dtok(u,p))


 First, use avltree_delete(), with that uint32_t, to remove the node.
Then, use gensetdyn_delete() to remove your object from your gensetdyn.



 (I ended up using avltree_delete directly, guessing you usually do
 that as well maybe?)


 Yes, that is the correct entry point for deletion.



- And on a completely unrelated topic, is there really no way
 (macro/function) in skalibs to remove an item from a genalloc? I get
 it for stralloc, it's not really thought of as an array but a string,
 you rarely want to remove one char from somehere in the middle, 
whatever.


 But with a genalloc, I'd expect one to (often) be wanting to remove an
 item from the array, and not always/only the last one, yet I can't see
 how that's supposed to be done? (Again, as in via a macro or
 something, of course one can - as I do - handle the memmove oneself)


 Well, genalloc is really "stralloc for larger objects", and I try
not to implement for genalloc what I would not implement for stralloc.
If you often need to remove items from a genalloc that