This email isn't really a bug report, but I’m seeing a pattern in
Semantic MediaWiki that sure is worrisome. I tend to think that
WikiApiary is pushing some boundaries for Semantic MediaWiki, so I fully
expect I might be seeing some behavior that hasn't been seen before, or
perhaps hasn't been monitored closely before.

As background, all 19,000+ websites on WikiApiary are assigned to a
segment. The segment is simply the Page ID mod 15 (relatively even
distribution between 0-15).

[[Has bot segment::{{ #expr: {{PAGEID}} mod {{WikiApiary:Bot segments}}
}}]]

I care about how these segments are balanced because the bots use them
to do work, so I have munin graph the count of websites in each segment
every 5 minutes. This has been happening for a while, and you can see
the graphs here:

http://db.thingelstad.com/munin/thingelstad.com/db.thingelstad.com/wikiapiary_segments.html

Now, take a moment to look at the monthly one.

The craziness that happened in Week 22 seems to have been the result of
some issue in the master branch. I’m sorry to say I didn’t do a good job
of tracking which commit I went between, but something started dropping
SMW data like crazy and an update to the newest master fixed it (does
composer keep a log that would tell me?)

However, I’m more concerned when I look at the weekly one. Note the
behavior in weeks 19, 20 and 21 the graph jumps up and then gradually
decays the entire week. There is NO behavior in WikiApiary that would
justify that pattern. It is worth noting that I have a cron job that
runs SMW_refresh every weekend. That is when you see the graph correct
back up.

This looks like there is some gradual decay in semantic data that is
naturally occurring, and then getting corrected by the refresh. (This
might also explain why sometimes websites just stop collecting data in
WikiApiary for no known reason, a bug I've tried fruitlessly to track
down in my code.)

I know everyone has concerns about the data store. It lacks unit tests
and all. This behavior, combined with the never-diagnosed duplicates
problem, makes me worry there are numerous issues at the heart of SMW
that need to be ferreted out. 

Note, if you are curious about how this data is collected you can see
these wiki pages:

https://wikiapiary.com/wiki/WikiApiary:Munin

The only valid reason for the counts in a segment going down is an
operator marking them as inactive, and that cannot explain the decay in
these graphs week over week.

-- 
  Jamie Thingelstad
  ja...@thingelstad.com

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to