On Thu, 09 Jun 2011 19:01:24 +0200, Ian Hickson <[email protected]> wrote:
On Thu, 9 Jun 2011, Philip Jägenstedt wrote:
As for the spec, I don't think it can or needs to define the algorithm
on a
form suitable form implementation. Something along these lines would be
much
clearer for reference:
1. create a (possibly disconnected) graph of all the items in the
document (or
subtree)
2. find the strongly connected components
3. create a list of "loopy" items: those that are in the same component
as any
other item
The traversal would remain mostly as before, but whenever an item is
encountered, one checks if it is in the list of "loopy" items and if so
ignores it. Since "loopy" is a global property, you'll see the same
properties
regardless of the path taken to reach it, which may or may not be the
case
with the current spec. (In any case, it's a nice feature.)
The main reason I didn't do something like this with the current spec was
that I was trying to minimise the work needed when implementing the API
in
a dynamic situation. The above would imply that any time anything in the
document changed in a way that could affect microdata, you'd have to redo
the whole document before the next time the API was invoked. That seems
expensive. (Consider the WHATWG spec, which has microdata in it and is
about 5MB. Do you really want to crawl the whole document looking for
microdata each time the API is invoked?)
What I had tried to do when implementing the spec is start at whatever
point in the DOM the API call was related to, and then search for loops
from that point, and drop anything loopy. That's still expensive, but it
at least minimises the total amount of work.
Does that make sense?
If the expense isn't a big deal then I don't mind doing it the other way
too, but this seems like an API that we're going to have enough trouble
getting implemented in the first place without giving implementors a
reason to avoid doing it at all.
I don't think the spec needs to be giving suggestions for efficient
implementation for live collections, because we inevitable won't implement
exactly that algorithm anyway. It is a problem if the algorithm doesn't
clearly map to some simpler higher-level behavior, as we certainly don't
want to emulate some edge-cases just to follow the letter of the spec.
But, let's disregard the exact algorithm for a moment and only consider
the actual requirement we're suggesting: "any item which can reach itself
by following itemrefs should be removed"
It seems to me that it's quite possible to check this criteria while
traversing using an algorithm of similar structure to what is already in
the spec. One issue is that one must first find all the loopy items and
then remove them in one step, rather than interleaving the
checking/removing. However, it seems that this could be solved by simply
flagging them instead of actually removing them, so that they will still
take part in later loop-checks.
Am I missing something significant about the spec'd algorithm that would
make it more efficient than the above?
If we just go ahead and show an efficient (enough) implementation of loop
removal using the suggested criteria, I assume that would be convincing
enough? Or do you really think that itemref is useless and complicated
enough that it would be better to throw it overboard?
--
Philip Jägenstedt
Core Developer
Opera Software