PHP Objects are dynamically extensible. You can add variables into the
ParserOutput just by using $parser->mOutput->varname.
Also you might want to note that ParserOutput also has addHeadItem which
should work for avoiding adding head items to the wrong place.
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)
Markus Krötzsch wrote:
On Freitag, 15. August 2008, Daniel Friesen wrote:
Sub parsers. In what kind of case does this kind of thing happen for you?
Normally, sub-parses happen with (a clone of) the current parser, e.g. when
using a <gallery>. But I am not aware of any guideline that states that
extensions are not allowed to create or clone new parser objects and use them
with any title they like. So anything could happen.
When one thing is being parsed, there is one parser doing that task. I
don't know of many cases where multiple parsers exist (unless an
extension is doing something screwy).
We have observed the use of multiple parsers or of one parser with multiple
title objects (this distinction is not really relevant for us) in between SMW
calls on various wikis. We use hooks during parsing to set the title of the
page that is currently processed, so we notice when titles change and we have
to reset the data (in a long PHP run there might be many titles that are
processed, and there is no guarantee that some save-hook is called before the
next page starts processing).
Initially in 1.2, we did just reset the data and title once during parsing,
and not all hooks did set the title again. This has lead to mean bugs, where
data was stored for the wrong title (we even had annotated special pages in
one case!). Since the title for storing was only set within hooks of the
parser (using getTitle() of the supplied parser), the only explanation is
that some other parser fired those hooks with a different title object being
parsed, and that this happened before we saved the current data to the DB.
Now we make sure that each and every hook call first sets the proper current
title and only later saves data of any kind. In this way it is at least
ensured that no data ever ends up in the wrong title, but data can still be
lost. Again it happened that titles changed between parsing and storing
(leading to losses of data, since the change of title also lead to clearing
the internal data buffer). So we now use a second buffer to store the data
already parsed for the *previous* title, just in case it turns out that the
next saving method actually wants to save this data! But this is just a hack:
we are blindly moving from hook to hook, parsing data here and there and not
knowing for which cases there will be a save-hook later on. It is all very
frustrating, and race conditions are still possible.
Even now we still experience cases where apparently random data is lost when
we create update jobs for all pages: some pages just loose their properties,
but these are different pages each time we try. And of course this affects at
most 10 pages each time on a densely annotated wiki with 7000 articles
(semanticweb.org).
With your report this morning, I also removed setting the title in
ParserBeforeStrip. Maybe this reduces the amount of wrongly set titles.
Have you tried making use of the ParserOutput? That seams like a to the
point thing, there should only be one of those for a parse.
I did not really find a way yet to use it properly. Can it hold additional
data somewhere?
Not only the semantic data, but also other "globals" are affected by similar
problems. We use globals to add CSS and JavaScripts to pages based on whether
they were needed on a page. It turned out that jobs are executed when viewing
a special page in between the time when the page is parsed and when the
output HTML is created. Hence any job would actually have to capture the
current globals and reset them after using any parsing, or otherwise the
job's parsers will "use up" the script data needed by the special page.
Again, one could add further protection to make sure scripts are
only "consumed" by the page that created them, but these are all just
workarounds for the basic problem: if you need to preserve data between
hooks, how can you make sure that the data is not stored for ever and still
remains available long enough until you need it?
-- Markus
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)
Markus Krötzsch wrote:
Hi Daniel,
it's always refreshing to get some thorough code critique from you in the
morning -- thanks for caring! I have added you to our contributors' list,
and I would much appreciate your ideas on some further hacks that I am
well aware of, see below.
Anyone want to explain to me why the ParserBeforeStrip hook is being
used to register parser functions?
In defence of my code: it works. Up to the introductions of
ParserFirstCallInit it was also one of the few hooks that got reliably
(at least in my experience) called before any parser function would be
needed.
That is a poor place for it, as well as unreliable. Which I can see by
how the function being called is a major hack relying on the first call
returning the callback name when already set..
Well, I have seen worse hacks (only part of which were in my code, but
see the remarks below on a major problem I still see there). But point
taken for this issue too.
Since I took the liberty of fixing up Semantic Forms, please see it as a
reference on how to correctly add Parser Functions to the parser:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticForms
/in cludes/SF_ParserFunctions.php?view=markup
Great, I added similar code to SMW now.
To stay with this topic, I feel that the whole parser hooking business is
bound to be one large hack. As a parser extension that stores data, you
need to hook to several places in MW, hoping that they are somehow called
in the expected order and that nobody overwrites your data in between
hooks. We have to store the parsed data somewhere, and this place needs
to be globally accessible since the parser offers no local storage to us
(none that would not be cloned with unrelated subparsers anyway). But
parsing is not global and happens in many parsers, or in many, possibly
nested, runs of one parser. The current code has evolved to prevent many
problems that this creates, but it lacks a unified approach towards
handling this situation.
Many things can still go wrong. There is no way of finding out whether we
run in the main parsing method of a wiki page text, or if we are just
called on some page footer or sub-parsing action triggered by some
extension. Jobs and extensions cross-fire with their own parsing calls,
often using different Title objects.
Do you have any insights on how to improve the runtime data management in
SMW so that we can collect data belonging to one article in multiple
hooks, not have it overwritten by other sub-hooks, and still do not get
memory leaks on very long runs? We cannot keep all data indefinitely just
because we are unsure whether we are still in a sub-parser and need the
data later on. But if we only store the *current* data, we need to find
out what title actually is currently parsed with the goal of storing or
updating its data in the DB.
Best regards,
Markus
------------------------------------------------------------------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge Build the coolest Linux based applications with Moblin SDK &
win great prizes Grand prize is a trip for two to an Open Source event
anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
------------------------------------------------------------------------
_______________________________________________
Semediawiki-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
------------------------------------------------------------------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
------------------------------------------------------------------------
_______________________________________________
Semediawiki-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Semediawiki-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel