Hi,

I'm grabbing this opportunity to bring up 3 bugs related to mwlib that
deserve a larger discussion and should perhaps be implemented
differently in the new version.

1. https://bugzilla.wikimedia.org/show_bug.cgi?id=56560 - PDF creation
tool considers IPv6 addresses as users, not anonymous.

I've pushed a patched for this and it was merged; however, the
detection was based on regex and, as a quick google search will tell
you, it's not so obvious to do a regex to cover all IPv6 cases.
Perhaps the information anon user/logged in user might be sent from
MW.

2. https://bugzilla.wikimedia.org/show_bug.cgi?id=56219 - PDF creation
tool excludes contributors with a "bot" substring in their username

I've also pushed a pull request for this one, but it was rejected
based on the en.wp policy that prevents bot-like usernames for humans.
The problem is more complex though:

a. Should bots be credited for their edits? While most of them do
simple tasks, we have recently seen an increase in bot-created
content. On ro.wp we even have a few lists only edited by robots.
b. If the robots should _not_ be credited, how do we detect them?
Ideally, there should be an automatical way to do so, but according to
http://www.mediawiki.org/wiki/Bots, it only works for recent changes.
Less ideally, only users with "bot" at the end should be removed, in
order to keep users like
https://ro.wikipedia.org/wiki/Utilizator:Vitalie_Ciubotaru (which is
not a robot, but has "bot" in the name) in the contributor list.


3. https://bugzilla.wikimedia.org/show_bug.cgi?id=2994 - Automatically
generated count and list of contributors to an article (authorship
tracking)

This is an old enhancement request, revived by me last month in a
wikimedia-l thread:
http://lists.wikimedia.org/pipermail/wikimedia-l/2013-October/128575.html
. The idea is to decide if and how to credit:
a. vandals
b. reverters
c. contributors which had their valid contributions rephrased or
replaced from the article.
d. contributors with valid contributions but invalid names

I hope the people working on this feature will take the time to
consider these issues and come up with solutions for them.

Thanks,
   Strainu


2013/11/13 Erik Moeller <[email protected]>:
> Hi folks,
>
> for a long time we've relied on the mwlib libraries by PediaPress to
> generate PDFs on Wikimedia sites. These have served us well (we
> generate >200K PDFs/day), but they architecturally pre-date a lot of
> important developments in MediaWiki, and actually re-implement the
> MediaWiki parser (!) in Python. The occasion of moving the entire PDF
> service to a new data-center has given us reason to re-think the
> architecture and come up with a minimally viable alternative that we
> can support long term.
>
> Most likely, we'll end up using Parsoid's HTML5 output, transform it
> to add required bits like licensing info and prettify it, and then
> render it to PDF via phantomjs, but we're still looking at various
> rendering options.
>
> Thanks to Matt Walker, C. Scott Ananian, Max Semenik, Brad Jorsch and
> Jeff Green for joining the effort, and thanks to the PediaPress folks
> for giving background as needed. Ideally we'd like to continue to
> support printed book generation via PediaPress' web service, while
> completely replacing the rendering tech stack on the WMF side of
> things (still using the Collection extension to manage books). We may
> need to deprecate some output formats - more on that as we go.
>
> We've got the collection-alt-renderer project set up on Labs (thanks
> Andrew) and can hopefully get a plan to our ops team soon as to how
> the new setup could work.
>
> If you want to peek - work channel is #mediawiki-pdfhack on FreeNode.
>
> Live notes here:
> http://etherpad.wikimedia.org/p/pdfhack
>
> Stuff will be consolidated here:
> https://www.mediawiki.org/wiki/PDF_rendering
>
> Some early experiments with different rendering strategies here:
> https://github.com/cscott/pdf-research
>
> Some improvements to Collection extension underway:
> https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/Collection,n,z
>
> More soon,
> Erik
>
> --
> Erik Möller
> VP of Engineering and Product Development, Wikimedia Foundation
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to