Re: [SMW-devel] Semantic MediaWiki and Parser Function Initialization

Daniel Friesen Fri, 15 Aug 2008 14:24:19 -0700

PHP Objects are dynamically extensible. You can add variables into theParserOutput just by using $parser->mOutput->varname.

Also you might want to note that ParserOutput also has addHeadItem whichshould work for avoiding adding head items to the wrong place.


~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)

Markus Krötzsch wrote:

On Freitag, 15. August 2008, Daniel Friesen wrote:
Sub parsers. In what kind of case does this kind of thing happen for you?
Normally, sub-parses happen with (a clone of) the current parser, e.g. whenusing a <gallery>. But I am not aware of any guideline that states thatextensions are not allowed to create or clone new parser objects and use themwith any title they like. So anything could happen.
When one thing is being parsed, there is one parser doing that task. I
don't know of many cases where multiple parsers exist (unless an
extension is doing something screwy).
We have observed the use of multiple parsers or of one parser with multipletitle objects (this distinction is not really relevant for us) in between SMWcalls on various wikis. We use hooks during parsing to set the title of thepage that is currently processed, so we notice when titles change and we haveto reset the data (in a long PHP run there might be many titles that areprocessed, and there is no guarantee that some save-hook is called before thenext page starts processing).Initially in 1.2, we did just reset the data and title once during parsing,and not all hooks did set the title again. This has lead to mean bugs, wheredata was stored for the wrong title (we even had annotated special pages inone case!). Since the title for storing was only set within hooks of theparser (using getTitle() of the supplied parser), the only explanation isthat some other parser fired those hooks with a different title object beingparsed, and that this happened before we saved the current data to the DB.
Now we make sure that each and every hook call first sets the proper currenttitle and only later saves data of any kind. In this way it is at leastensured that no data ever ends up in the wrong title, but data can still belost. Again it happened that titles changed between parsing and storing(leading to losses of data, since the change of title also lead to clearingthe internal data buffer). So we now use a second buffer to store the dataalready parsed for the *previous* title, just in case it turns out that thenext saving method actually wants to save this data! But this is just a hack:we are blindly moving from hook to hook, parsing data here and there and notknowing for which cases there will be a save-hook later on. It is all veryfrustrating, and race conditions are still possible.
Even now we still experience cases where apparently random data is lost whenwe create update jobs for all pages: some pages just loose their properties,but these are different pages each time we try. And of course this affects atmost 10 pages each time on a densely annotated wiki with 7000 articles(semanticweb.org).
With your report this morning, I also removed setting the title inParserBeforeStrip. Maybe this reduces the amount of wrongly set titles.
Have you tried making use of the ParserOutput? That seams like a to the
point thing, there should only be one of those for a parse.
I did not really find a way yet to use it properly. Can it hold additionaldata somewhere?
Not only the semantic data, but also other "globals" are affected by similarproblems. We use globals to add CSS and JavaScripts to pages based on whetherthey were needed on a page. It turned out that jobs are executed when viewinga special page in between the time when the page is parsed and when theoutput HTML is created. Hence any job would actually have to capture thecurrent globals and reset them after using any parsing, or otherwise thejob's parsers will "use up" the script data needed by the special page.Again, one could add further protection to make sure scripts areonly "consumed" by the page that created them, but these are all justworkarounds for the basic problem: if you need to preserve data betweenhooks, how can you make sure that the data is not stored for ever and stillremains available long enough until you need it?
-- Markus
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)

Markus Krötzsch wrote:
Hi Daniel,

it's always refreshing to get some thorough code critique from you in the
morning -- thanks for caring! I have added you to our contributors' list,
and I would much appreciate your ideas on some further hacks that I am
well aware of, see below.
Anyone want to explain to me why the ParserBeforeStrip hook is being
used to register parser functions?
In defence of my code: it works. Up to the introductions of
ParserFirstCallInit it was also one of the few hooks that got reliably
(at least in my experience) called before any parser function would be
needed.
That is a poor place for it, as well as unreliable. Which I can see by
how the function being called is a major hack relying on the first call
returning the callback name when already set..
Well, I have seen worse hacks (only part of which were in my code, but
see the remarks below on a major problem I still see there). But point
taken for this issue too.
Since I took the liberty of fixing up Semantic Forms, please see it as a
reference on how to correctly add Parser Functions to the parser:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticForms
/in cludes/SF_ParserFunctions.php?view=markup
Great, I added similar code to SMW now.


To stay with this topic, I feel that the whole parser hooking business is
bound to be one large hack. As a parser extension that stores data, you
need to hook to several places in MW, hoping that they are somehow called
in the expected order and that nobody overwrites your data in between
hooks. We have to store the parsed data somewhere, and this place needs
to be globally accessible since the parser offers no local storage to us
(none that would not be cloned with unrelated subparsers anyway). But
parsing is not global and happens in many parsers, or in many, possibly
nested, runs of one parser. The current code has evolved to prevent many
problems that this creates, but it lacks a unified approach towards
handling this situation.

Many things can still go wrong. There is no way of finding out whether we
run in the main parsing method of a wiki page text, or if we are just
called on some page footer or sub-parsing action triggered by some
extension. Jobs and extensions cross-fire with their own parsing calls,
often using different Title objects.

Do you have any insights on how to improve the runtime data management in
SMW so that we can collect data belonging to one article in multiple
hooks, not have it overwritten by other sub-hooks, and still do not get
memory leaks on very long runs? We cannot keep all data indefinitely just
because we are unsure whether we are still in a sub-parser and need the
data later on. But if we only store the *current* data, we need to find
out what title actually is currently parsed with the goal of storing or
updating its data in the DB.


Best regards,

Markus



------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge Build the coolest Linux based applications with Moblin SDK &
win great prizes Grand prize is a trip for two to an Open Source event
anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
------------------------------------------------------------------------

_______________________________________________
Semediawiki-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
------------------------------------------------------------------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
------------------------------------------------------------------------

_______________________________________________
Semediawiki-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

_______________________________________________
Semediawiki-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Re: [SMW-devel] Semantic MediaWiki and Parser Function Initialization

Reply via email to