Hi, This is not an analysis of the mentioned issue but an explanation as to how updates of a wikipage will end up in the SMW-store.
## Hooks MediaWiki deploys several hooks that are used to identify, parse and filter information relevant to SMW. The hooks required for the storage process are InternalParseBeforeLinks, LinksUpdateConstructed, NewRevisionFromEditComplete, and ParserAfterTidy. ## Page edit/save After someone or somewhat (a bot) edits and saves a page, the InternalParseBeforeLinks hook [1] will be called which in case of SMW is responsible for parsing the "raw" text from the wikipage using the InTextAnnotationParser. The InTextAnnotationParser will convert links like [[Foo::bar]] into an internal representation and remove any SMW specific logic ([[ :: ]]) from the text in order for the wikipage to display a simple "bar". At this point, data are not stored with SMW itself and instead the collected data [2] are transferred using the ParserOutput object to enable post-processing after a edit/save page process. Each time MediaWiki executes Parser::parse(), the InternalParseBeforeLinks hook is fired and a status ('smw-semanticdata-status' as Page property [3]) is set to distinguish between SMW relevant and non-relevant edits. ## Predefined properties NewRevisionFromEditComplete is called when a revision was created and will add predefined properties such as "Modification date" etc. Again, at this point data are not stored and only updates to the ParserOutput object are carried out. ## Sortkey / Category ParserAfterTidy hook is used to update the ParserOutput object with sortkey and category information as they are not available in an earlier process. ## Store update LinksUpdateConstructed is one of three places in MW where the collected data (in form of [2]) are retrieved from the ParserOutput object to initiate a StoreUpdate. ## Page purge In some circumstances (based on the customizing) it is desirable to purge the content of a page together with its semantic data (in earlier version of SMW that was not possible) therefore the ParserAfterTidy hook is used for this occasion as well since LinksUpdateConstructed can't be used (it is only triggered on a page save). It makes the ParserAfterTidy hook the second place that can initiate a StoreUpdate but only in case of "&action=purge". ## Data rebuild When data are scheduled for a rebuild, each selected page will trigger an UpdateJob [4]. The UpdateJob at the time of its execution will find the most recent revision using the ContentParser and parses its "raw" text (and by doing so run through the InternalParseBeforeLinks, NewRevisionFromEditComplete, and ParserAfterTidy hook) to create a ParserOutput object. The created ParserOutput is used to retrieve the SemanticData container, followed by a StoreUpdate. UpdateJob is the third place to set off a StoreUpdate (of course you could trigger a LinksUpdate as the refreshLinksJob does but that's another discussion). PS: A general note, WikiApiary currently runs MediaWiki 1.23.0-rc.1 (which because of [0] should not be used in production). [0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/212 [1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/includes/src/MediaWiki/Hooks/InternalParseBeforeLinks.php [2] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/includes/SemanticData.php [3] https://www.mediawiki.org/wiki/Manual:Page_props_table [4] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/includes/src/MediaWiki/Jobs/UpdateJob.php Cheers On 6/3/14, Jamie Thingelstad <ja...@thingelstad.com> wrote: > This email isn't really a bug report, but I’m seeing a pattern in > Semantic MediaWiki that sure is worrisome. I tend to think that > WikiApiary is pushing some boundaries for Semantic MediaWiki, so I fully > expect I might be seeing some behavior that hasn't been seen before, or > perhaps hasn't been monitored closely before. > > As background, all 19,000+ websites on WikiApiary are assigned to a > segment. The segment is simply the Page ID mod 15 (relatively even > distribution between 0-15). > > [[Has bot segment::{{ #expr: {{PAGEID}} mod {{WikiApiary:Bot segments}} > }}]] > > I care about how these segments are balanced because the bots use them > to do work, so I have munin graph the count of websites in each segment > every 5 minutes. This has been happening for a while, and you can see > the graphs here: > > http://db.thingelstad.com/munin/thingelstad.com/db.thingelstad.com/wikiapiary_segments.html > > Now, take a moment to look at the monthly one. > > The craziness that happened in Week 22 seems to have been the result of > some issue in the master branch. I’m sorry to say I didn’t do a good job > of tracking which commit I went between, but something started dropping > SMW data like crazy and an update to the newest master fixed it (does > composer keep a log that would tell me?) > > However, I’m more concerned when I look at the weekly one. Note the > behavior in weeks 19, 20 and 21 the graph jumps up and then gradually > decays the entire week. There is NO behavior in WikiApiary that would > justify that pattern. It is worth noting that I have a cron job that > runs SMW_refresh every weekend. That is when you see the graph correct > back up. > > This looks like there is some gradual decay in semantic data that is > naturally occurring, and then getting corrected by the refresh. (This > might also explain why sometimes websites just stop collecting data in > WikiApiary for no known reason, a bug I've tried fruitlessly to track > down in my code.) > > I know everyone has concerns about the data store. It lacks unit tests > and all. This behavior, combined with the never-diagnosed duplicates > problem, makes me worry there are numerous issues at the heart of SMW > that need to be ferreted out. > > Note, if you are curious about how this data is collected you can see > these wiki pages: > > https://wikiapiary.com/wiki/WikiApiary:Munin > > The only valid reason for the counts in a segment going down is an > operator marking them as inactive, and that cannot explain the decay in > these graphs week over week. > > -- > Jamie Thingelstad > ja...@thingelstad.com > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/NeoTech > _______________________________________________ > Semediawiki-devel mailing list > Semediawiki-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech _______________________________________________ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel