PHP Objects are dynamically extensible. You can add variables into the ParserOutput just by using $parser->mOutput->varname.

Also you might want to note that ParserOutput also has addHeadItem which should work for avoiding adding head items to the wrong place.

~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)

Markus Krötzsch wrote:
On Freitag, 15. August 2008, Daniel Friesen wrote:
Sub parsers. In what kind of case does this kind of thing happen for you?

Normally, sub-parses happen with (a clone of) the current parser, e.g. when using a <gallery>. But I am not aware of any guideline that states that extensions are not allowed to create or clone new parser objects and use them with any title they like. So anything could happen.

When one thing is being parsed, there is one parser doing that task. I
don't know of many cases where multiple parsers exist (unless an
extension is doing something screwy).

We have observed the use of multiple parsers or of one parser with multiple title objects (this distinction is not really relevant for us) in between SMW calls on various wikis. We use hooks during parsing to set the title of the page that is currently processed, so we notice when titles change and we have to reset the data (in a long PHP run there might be many titles that are processed, and there is no guarantee that some save-hook is called before the next page starts processing). Initially in 1.2, we did just reset the data and title once during parsing, and not all hooks did set the title again. This has lead to mean bugs, where data was stored for the wrong title (we even had annotated special pages in one case!). Since the title for storing was only set within hooks of the parser (using getTitle() of the supplied parser), the only explanation is that some other parser fired those hooks with a different title object being parsed, and that this happened before we saved the current data to the DB.

Now we make sure that each and every hook call first sets the proper current title and only later saves data of any kind. In this way it is at least ensured that no data ever ends up in the wrong title, but data can still be lost. Again it happened that titles changed between parsing and storing (leading to losses of data, since the change of title also lead to clearing the internal data buffer). So we now use a second buffer to store the data already parsed for the *previous* title, just in case it turns out that the next saving method actually wants to save this data! But this is just a hack: we are blindly moving from hook to hook, parsing data here and there and not knowing for which cases there will be a save-hook later on. It is all very frustrating, and race conditions are still possible.

Even now we still experience cases where apparently random data is lost when we create update jobs for all pages: some pages just loose their properties, but these are different pages each time we try. And of course this affects at most 10 pages each time on a densely annotated wiki with 7000 articles (semanticweb.org).

With your report this morning, I also removed setting the title in ParserBeforeStrip. Maybe this reduces the amount of wrongly set titles.

Have you tried making use of the ParserOutput? That seams like a to the
point thing, there should only be one of those for a parse.

I did not really find a way yet to use it properly. Can it hold additional data somewhere?

Not only the semantic data, but also other "globals" are affected by similar problems. We use globals to add CSS and JavaScripts to pages based on whether they were needed on a page. It turned out that jobs are executed when viewing a special page in between the time when the page is parsed and when the output HTML is created. Hence any job would actually have to capture the current globals and reset them after using any parsing, or otherwise the job's parsers will "use up" the script data needed by the special page. Again, one could add further protection to make sure scripts are only "consumed" by the page that created them, but these are all just workarounds for the basic problem: if you need to preserve data between hooks, how can you make sure that the data is not stored for ever and still remains available long enough until you need it?

-- Markus

~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)

Markus Krötzsch wrote:
Hi Daniel,

it's always refreshing to get some thorough code critique from you in the
morning -- thanks for caring! I have added you to our contributors' list,
and I would much appreciate your ideas on some further hacks that I am
well aware of, see below.

Anyone want to explain to me why the ParserBeforeStrip hook is being
used to register parser functions?
In defence of my code: it works. Up to the introductions of
ParserFirstCallInit it was also one of the few hooks that got reliably
(at least in my experience) called before any parser function would be
needed.

That is a poor place for it, as well as unreliable. Which I can see by
how the function being called is a major hack relying on the first call
returning the callback name when already set..
Well, I have seen worse hacks (only part of which were in my code, but
see the remarks below on a major problem I still see there). But point
taken for this issue too.

Since I took the liberty of fixing up Semantic Forms, please see it as a
reference on how to correctly add Parser Functions to the parser:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticForms
/in cludes/SF_ParserFunctions.php?view=markup
Great, I added similar code to SMW now.


To stay with this topic, I feel that the whole parser hooking business is
bound to be one large hack. As a parser extension that stores data, you
need to hook to several places in MW, hoping that they are somehow called
in the expected order and that nobody overwrites your data in between
hooks. We have to store the parsed data somewhere, and this place needs
to be globally accessible since the parser offers no local storage to us
(none that would not be cloned with unrelated subparsers anyway). But
parsing is not global and happens in many parsers, or in many, possibly
nested, runs of one parser. The current code has evolved to prevent many
problems that this creates, but it lacks a unified approach towards
handling this situation.

Many things can still go wrong. There is no way of finding out whether we
run in the main parsing method of a wiki page text, or if we are just
called on some page footer or sub-parsing action triggered by some
extension. Jobs and extensions cross-fire with their own parsing calls,
often using different Title objects.

Do you have any insights on how to improve the runtime data management in
SMW so that we can collect data belonging to one article in multiple
hooks, not have it overwritten by other sub-hooks, and still do not get
memory leaks on very long runs? We cannot keep all data indefinitely just
because we are unsure whether we are still in a sub-parser and need the
data later on. But if we only store the *current* data, we need to find
out what title actually is currently parsed with the goal of storing or
updating its data in the DB.


Best regards,

Markus



------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge Build the coolest Linux based applications with Moblin SDK &
win great prizes Grand prize is a trip for two to an Open Source event
anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
------------------------------------------------------------------------

_______________________________________________
Semediawiki-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel



------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
------------------------------------------------------------------------

_______________________________________________
Semediawiki-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Semediawiki-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to