Hi, I'm grabbing this opportunity to bring up 3 bugs related to mwlib that deserve a larger discussion and should perhaps be implemented differently in the new version.
1. https://bugzilla.wikimedia.org/show_bug.cgi?id=56560 - PDF creation tool considers IPv6 addresses as users, not anonymous. I've pushed a patched for this and it was merged; however, the detection was based on regex and, as a quick google search will tell you, it's not so obvious to do a regex to cover all IPv6 cases. Perhaps the information anon user/logged in user might be sent from MW. 2. https://bugzilla.wikimedia.org/show_bug.cgi?id=56219 - PDF creation tool excludes contributors with a "bot" substring in their username I've also pushed a pull request for this one, but it was rejected based on the en.wp policy that prevents bot-like usernames for humans. The problem is more complex though: a. Should bots be credited for their edits? While most of them do simple tasks, we have recently seen an increase in bot-created content. On ro.wp we even have a few lists only edited by robots. b. If the robots should _not_ be credited, how do we detect them? Ideally, there should be an automatical way to do so, but according to http://www.mediawiki.org/wiki/Bots, it only works for recent changes. Less ideally, only users with "bot" at the end should be removed, in order to keep users like https://ro.wikipedia.org/wiki/Utilizator:Vitalie_Ciubotaru (which is not a robot, but has "bot" in the name) in the contributor list. 3. https://bugzilla.wikimedia.org/show_bug.cgi?id=2994 - Automatically generated count and list of contributors to an article (authorship tracking) This is an old enhancement request, revived by me last month in a wikimedia-l thread: http://lists.wikimedia.org/pipermail/wikimedia-l/2013-October/128575.html . The idea is to decide if and how to credit: a. vandals b. reverters c. contributors which had their valid contributions rephrased or replaced from the article. d. contributors with valid contributions but invalid names I hope the people working on this feature will take the time to consider these issues and come up with solutions for them. Thanks, Strainu 2013/11/13 Erik Moeller <[email protected]>: > Hi folks, > > for a long time we've relied on the mwlib libraries by PediaPress to > generate PDFs on Wikimedia sites. These have served us well (we > generate >200K PDFs/day), but they architecturally pre-date a lot of > important developments in MediaWiki, and actually re-implement the > MediaWiki parser (!) in Python. The occasion of moving the entire PDF > service to a new data-center has given us reason to re-think the > architecture and come up with a minimally viable alternative that we > can support long term. > > Most likely, we'll end up using Parsoid's HTML5 output, transform it > to add required bits like licensing info and prettify it, and then > render it to PDF via phantomjs, but we're still looking at various > rendering options. > > Thanks to Matt Walker, C. Scott Ananian, Max Semenik, Brad Jorsch and > Jeff Green for joining the effort, and thanks to the PediaPress folks > for giving background as needed. Ideally we'd like to continue to > support printed book generation via PediaPress' web service, while > completely replacing the rendering tech stack on the WMF side of > things (still using the Collection extension to manage books). We may > need to deprecate some output formats - more on that as we go. > > We've got the collection-alt-renderer project set up on Labs (thanks > Andrew) and can hopefully get a plan to our ops team soon as to how > the new setup could work. > > If you want to peek - work channel is #mediawiki-pdfhack on FreeNode. > > Live notes here: > http://etherpad.wikimedia.org/p/pdfhack > > Stuff will be consolidated here: > https://www.mediawiki.org/wiki/PDF_rendering > > Some early experiments with different rendering strategies here: > https://github.com/cscott/pdf-research > > Some improvements to Collection extension underway: > https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/Collection,n,z > > More soon, > Erik > > -- > Erik Möller > VP of Engineering and Product Development, Wikimedia Foundation > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
