Re: [Wikitech-l] Operations buy in on Architecture of mwlib Replacement

2013-11-13 Thread Gabriel Wicke
On 11/13/2013 08:18 PM, MZMcBride wrote:
> Matthew replied on-wiki, but I'll add that there's a dream within the
> MediaWiki tech community to be able to simply do "apt-get mediawiki" or
> similar on a spun-up virtual machine and everything will quickly and
> easily be set up for you.
> 
> There's a contrasting view that MediaWiki should only serve as the
> platform for Wikimedia wikis (large, high-volume sites) and that it's
> overkill for any small wiki setup. This view also usually advocates not
> focusing on third-party support, naturally, which removes Jimmy the casual
> MediaWiki user from the equation.

Ha! Having good packaging so that you can just do "apt-get mediawiki"
would actually eliminate some of this dichotomy.

Gabriel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Operations buy in on Architecture of mwlib Replacement

2013-11-13 Thread MZMcBride
Marcin Cieslak wrote:
>>> Matthew Walker  wrote:
>> [1] https://www.mediawiki.org/wiki/PDF_rendering/Architecture
>
>I think requirement number one is that Jimmy the casual MediaWiki
>user would be able to install his own renederer without replicating
>WMF infrastructure:

Matthew replied on-wiki, but I'll add that there's a dream within the
MediaWiki tech community to be able to simply do "apt-get mediawiki" or
similar on a spun-up virtual machine and everything will quickly and
easily be set up for you.

There's a contrasting view that MediaWiki should only serve as the
platform for Wikimedia wikis (large, high-volume sites) and that it's
overkill for any small wiki setup. This view also usually advocates not
focusing on third-party support, naturally, which removes Jimmy the casual
MediaWiki user from the equation.

Whether either of these ideas (and many more) should guide architectural
design decisions is, of course, a matter of debate. :-)

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] New Bugzilla users have restricted accounts

2013-11-13 Thread MZMcBride
Marcin Cieslak wrote:
>>> Andre Klapper  wrote:
>> I don't know your specific usecase - maybe the shared saved search named
>> ""My" CC'd Bugs" might work (or not) which you could enable on
>> https://bugzilla.wikimedia.org/userprefs.cgi?tab=saved-searches (see
>> http://blogs.gnome.org/aklapper/2013/07/12/bugzillatips-saved-searches/
>> for general info on saved searches and sharing them with other users).
>
>I've been using "i-am-on-cc" (now shared) filter similar to this one
>to a great success to find stuff I am working on/interested in.

While we're sharing Bugzilla experiences...

A few weeks ago I made a filter (saved search) called "BUGZ" that tracks
bugs where I've commented, I'm on the CC list, I'm the reporter, or it's
assigned to me, limiting to unresolved tickets (but including the new
PATCH_TO_REVIEW status), sorted in reverse chronological order by date
changed.

It seems to work pretty well, basically duplicating the bugspam feed I
get. I'm not sure there's a way to generalize it in our version of
Bugzilla (the search currently hardcodes my e-mail address, I think).

I have two small quibbles with it: if a bug has only its CC field changed,
the date changed still gets bumped and it requires looking at the bug
history to figure out what happened. And if a bug has recently been marked
resolved, it gets bumped from the list. Other than that, it was worth the
few minutes it took to finally set up a custom filter for myself. I
removed the others from the sidebar to decrease interface noise.

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] New Bugzilla users have restricted accounts

2013-11-13 Thread Marcin Cieslak
>> Andre Klapper  wrote:
> I don't know your specific usecase - maybe the shared saved search named
> ""My" CC'd Bugs" might work (or not) which you could enable on
> https://bugzilla.wikimedia.org/userprefs.cgi?tab=saved-searches (see
> http://blogs.gnome.org/aklapper/2013/07/12/bugzillatips-saved-searches/
> for general info on saved searches and sharing them with other users).

I've been using "i-am-on-cc" (now shared) filter similar to this one
to a great success to find stuff I am working on/interested in.

//Saper


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Operations buy in on Architecture of mwlib Replacement

2013-11-13 Thread Marcin Cieslak
>> Matthew Walker  wrote:
> [1 ]https://www.mediawiki.org/wiki/PDF_rendering/Architecture

I think requirement number one is that Jimmy the casual MediaWiki
user would be able to install his own renederer without replicating
WMF infrastructure:

https://www.mediawiki.org/wiki/Talk:PDF_rendering/Architecture#Simple_set_up_for_casual_MediaWiki_users_35545

//Saper



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Architectural leadership in Wikimedia's technical community

2013-11-13 Thread Gabriel Wicke
On 11/10/2013 10:51 PM, Tim Starling wrote:
> On 08/11/13 03:40, C. Scott Ananian wrote:
>> Certain people 'own' larger collections of modules -- like there are
>> subsystem owners in the linux kernel dev world.
> 
> My concern with this kind of maintainer model is that RFC review would
> tend to be narrower -- a consensus of members of a single WMF team
> rather than a consensus of all relevant experts.

I am skeptical about such a narrow maintainer model too.

Architecture should have a broader perspective than one module at a
time. An important part of the role of architects is driving a consensus
process both in the foundation and also in the larger MediaWiki
community about how modules should interact and maybe also which modules
we need, especially in the back end. They should also make sure that
longer-term global issues are considered before they become pressing.

Like others, I see WMF job titles fairly separate from roles in the
wider MediaWiki community. The goals of the foundation are also not
always the same as those of each member of the community. Wikia for
example might have priorities that differ from somebody running
MediaWiki in an intranet. Because of this, I think it would help to
separate the issue of MediaWiki governance from that of Wikimedia
Foundation roles and architectural leadership within the Wikimedia
Foundation.

Within the Foundation I can see advantages to holding more people
responsible for looking out for architectural issues, just to make sure
it happens and scales. I don't think that it matters much *internally*
whether those are called 'principal engineer' or 'architect'. Lets use
the title whose common definition fits the actual role most accurately.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Operations buy in on Architecture of mwlib Replacement

2013-11-13 Thread C. Scott Ananian
Yeah we've been running 0.10 in development for Parsoid for a while. So no
problems expected... other than unpredictable load gremlins or some such.
It sounds like gwicke's plan is to ramp up the load gradually to try to
head that off.
  --scott
On Nov 13, 2013 6:02 PM, "Matthew Walker"  wrote:

> Hey,
>
> For the new renderer backend for the Collections Extension we've come up
> with a tentative architecture that we would like operations buy in on. The
> living document is here [1]. It's worth saying explicitly that whatever
> setup we use must be able to handle the greater than 150k requests a day we
> serve using the old setup.
>
> Basically we're looking at having
> * 'render servers' run node.js
> * doing job management in Redis
> * rendering content using PhantomJS and/or Latex
> * storing rendered files locally on the render servers (and streaming the
> rendered results through MediaWiki -- this is how it's done now as well).
> * having a garbage collector run routinely on the render servers to
> cleanup old stale content
>
> Post comments to the talk page please :)
>
> [1 ]https://www.mediawiki.org/wiki/PDF_rendering/Architecture
>
> ~Matt Walker
> Wikimedia Foundation
> Fundraising Technology Team
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Ops] Operations buy in on Architecture of mwlib Replacement

2013-11-13 Thread Matthew Walker
Faidon,

Fantastic! I didn't know we had an internal backport already. :)

Gabriel basically just told me to use v0.10 because that's what he was
moving to for parsoid. So... v0.10!

~Matt Walker
Wikimedia Foundation
Fundraising Technology Team


On Wed, Nov 13, 2013 at 4:04 PM, Faidon Liambotis wrote:

> On Wed, Nov 13, 2013 at 03:41:33PM -0800, Matthew Walker wrote:
>
>> * Node.JS itself should be installable via apt package (we'll have to do
>> a custom package so that we get Node v10)
>>
>
> I haven't looked at your document yet, but a quick note on that: I have
> nodejs 0.10 backported packages ready for about 10 days now.
>
> We typically avoid running multiple versions of the same package across
> the infrastructure (and our apt repo isn't split like that, thankfully), so
> I'd like to upgrade the existing users to 0.10. These are parsoid, statsd,
> perhaps the not-production-yet limn, etherpad-lite and of these, parsoid is
> the one with the most impact.
>
> As such, we've agreed with Gabriel -which needed node 0.10 for rashomon
> anyway- to test the new version under the parsoid RTT suite & subsequently
> in the Parsoid Labs instance, before we go and upgrade production. (The
> packages have been in parsoid.wmflabs.org's /root/ since). I haven't
> heard since but as they don't /need/ a new Node version right now, I guess
> this is low priority for them (and we, ops, don't care much either).
>
> I think this would happen anyway before the PDF service would ever reach
> production, but I think we can prioritize it a bit more and make sure it
> will. Gabriel, what do you think?
>
> Regards,
> Faidon
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Facebook Open Academy

2013-11-13 Thread Tyler Romeo
MediaWiki participates in a number of student competitions and programs as
an open source mentor (such as GSoC, Code-In, etc.). Today I ran into
another one: Facebook's Open Academy Program.

https://www.facebook.com/OpenAcademyProgram

I'm not sure how we would get involved in this program, but I'm sure people
would agree it might be a good thing to become a mentor organization and
have students contribute to MediaWiki as part of a college credit program.

Any thoughts?
*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Ops] Operations buy in on Architecture of mwlib Replacement

2013-11-13 Thread Faidon Liambotis

On Wed, Nov 13, 2013 at 03:41:33PM -0800, Matthew Walker wrote:
* Node.JS itself should be installable via apt package (we'll have to 
do a custom package so that we get Node v10)


I haven't looked at your document yet, but a quick note on that: I have 
nodejs 0.10 backported packages ready for about 10 days now.


We typically avoid running multiple versions of the same package across 
the infrastructure (and our apt repo isn't split like that, thankfully), 
so I'd like to upgrade the existing users to 0.10. These are parsoid, 
statsd, perhaps the not-production-yet limn, etherpad-lite and of these, 
parsoid is the one with the most impact.


As such, we've agreed with Gabriel -which needed node 0.10 for rashomon 
anyway- to test the new version under the parsoid RTT suite & 
subsequently in the Parsoid Labs instance, before we go and upgrade 
production. (The packages have been in parsoid.wmflabs.org's /root/ 
since). I haven't heard since but as they don't /need/ a new Node 
version right now, I guess this is low priority for them (and we, ops, 
don't care much either).


I think this would happen anyway before the PDF service would ever reach 
production, but I think we can prioritize it a bit more and make sure it 
will. Gabriel, what do you think?


Regards,
Faidon

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Operations buy in on Architecture of mwlib Replacement

2013-11-13 Thread Matthew Walker
As a followup, it's worth talking about puppetization and how we're going
to accomplish that.

* Node.JS itself should be installable via apt package (we'll have to do a
custom package so that we get Node v10)
* Node dependencies will be all 'npm install'ed into a node_modules
submodule of the main repo for the application which we can deploy with the
rest of the application code.
** It's worth noting that although this means that we'll still be pulling
our dependencies from a separate source initially; what is currently in
production will be in our git repos. We can also version lock inside our
configuration.

~Matt Walker
Wikimedia Foundation
Fundraising Technology Team


On Wed, Nov 13, 2013 at 3:02 PM, Matthew Walker wrote:

> Hey,
>
> For the new renderer backend for the Collections Extension we've come up
> with a tentative architecture that we would like operations buy in on. The
> living document is here [1]. It's worth saying explicitly that whatever
> setup we use must be able to handle the greater than 150k requests a day we
> serve using the old setup.
>
> Basically we're looking at having
> * 'render servers' run node.js
> * doing job management in Redis
> * rendering content using PhantomJS and/or Latex
> * storing rendered files locally on the render servers (and streaming the
> rendered results through MediaWiki -- this is how it's done now as well).
> * having a garbage collector run routinely on the render servers to
> cleanup old stale content
>
> Post comments to the talk page please :)
>
> [1 ]https://www.mediawiki.org/wiki/PDF_rendering/Architecture
>
> ~Matt Walker
> Wikimedia Foundation
> Fundraising Technology Team
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Operations buy in on Architecture of mwlib Replacement

2013-11-13 Thread Matthew Walker
Hey,

For the new renderer backend for the Collections Extension we've come up
with a tentative architecture that we would like operations buy in on. The
living document is here [1]. It's worth saying explicitly that whatever
setup we use must be able to handle the greater than 150k requests a day we
serve using the old setup.

Basically we're looking at having
* 'render servers' run node.js
* doing job management in Redis
* rendering content using PhantomJS and/or Latex
* storing rendered files locally on the render servers (and streaming the
rendered results through MediaWiki -- this is how it's done now as well).
* having a garbage collector run routinely on the render servers to cleanup
old stale content

Post comments to the talk page please :)

[1 ]https://www.mediawiki.org/wiki/PDF_rendering/Architecture

~Matt Walker
Wikimedia Foundation
Fundraising Technology Team
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Pre-Release Announcement for MediaWiki 1.19.9, 1.20.8, and 1.21.3

2013-11-13 Thread Chris Steipp
This is a notice that on Thursday, November 14th between 21:00-22:00 UTC
(1-2pm PST) Wikimedia Foundation will release security updates for current
and supported branches of the MediaWiki software, as well as several
extensions. Downloads and patches will be available at that time.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Technical Writer - Contract - 3 Months (+)

2013-11-13 Thread Quim Gil
The Engineering Community team at the Wikimedia Foundation has opened a
position for a

Technical Writer - Contract - 3 Months (+)

The description is copied below. Technical writers with a contribution
history at mediawiki.org or other MediaWiki-based sites will be
especially considered. If you are interested, please apply using this
web form:

http://hire.jobvite.com/j/?aj=oMK5Xfwi&s=Community

Wikimedia's Engineering Community team is responsible for developing
clear documentation for MediaWiki. All of our documentation is written
collaboratively in wiki pages, involving all kinds of profiles, from WMF
professional developers to anonymous users. There are some areas of our
documentation that lack content or are outdated, while others have grown
organically and they need pruning and polishing. It is complex to
recruit volunteers for this type of work.

Scope of Work
The technical writer will report to Quim Gil, Technical Contributor
Coordinator.  The main area of focus will be system architecture
documentation, though we may identify other related areas during the
course of the contract. Our main technical documentation exists on
https://www.mediawiki.org and https://wikitech.wikimedia.org.
Ideally, this person will start immediately, and should start no later
than December 1, 2013. The technical writer will attend the MediaWiki
Architecture Summit on January 23-24 in San Francisco, where s/he will
be in charge of consolidating the meeting notes taken and polishing the
related documentation.
This is the only on-site task planned. The technical writer can be in
any location during the rest of the contract period as long as it has
good Internet connectivity. We can offer an on-site location in our
offices in San Francisco, although this contract does not include any
relocation or visa support.

Outcome and Performance Standards
This work requires familiarity with wiki syntax and collaborative
workflows. The technical writer will be improving actual pages edit by
edit, with the possibility to find other edits being published by other
contributors as well as related discussions where s/he is supposed to
engage and respond. Quim Gil and other Engineering Community team
members will monitor regularly the work and will assist with the
community dialog if needed.
The technical writer will have a backlog of areas to work on, agreed
with the EC team. There will be weekly reviews or a similar procedure to
review and sign off the tasks completed.

Qualifications:
3 years of professional experience writing technical documentation
Experience volunteering in one or more free software projects as a
documentation writer is highly valuable. (Please provide links to your
user profile and main works.)
Knowledge of PHP / JavaScript  (even better when supported with pet
projects, open source contributions or certified training)

-- 
Quim Gil
Technical Contributor Coordinator @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Re-implementing PDF support

2013-11-13 Thread Gabriel Wicke
On 11/13/2013 08:10 AM, Tyler Romeo wrote:
> On Wed, Nov 13, 2013 at 12:45 AM, Erik Moeller  wrote:
> 
>> Most likely, we'll end up using Parsoid's HTML5 output, transform it
>> to add required bits like licensing info and prettify it, and then
>> render it to PDF via phantomjs, but we're still looking at various
>> rendering options.
>>
> 
> I don't have anything against this, but what's the reasoning? You now have
> to parse the wikitext into HTML5 and then parse the HTML5 into PDF.

We are already parsing all edited pages to HTML5 and will also start
storing (rather than just caching) this HTML very soon, so there will
not be any extra parsing involved in the longer term. Getting the HTML
will basically be a request for a static HTML page.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Re-implementing PDF support

2013-11-13 Thread Tyler Romeo
On Wed, Nov 13, 2013 at 11:16 AM, Brad Jorsch (Anomie) <
bjor...@wikimedia.org> wrote:

> Yes, phantomjs, as mentioned in the original message.
>
> To be more specific, phantomjs is basically WebKit without a GUI, so
> the output would be roughly equivalent to opening the page in Chrome
> or Safari and printing to a PDF. Future plans include using bookjs or
> the like to improve the rendering.
>

Aha awesome. Thanks for explaining.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] The interwiki table as a whitelist of non-spammy sites

2013-11-13 Thread Mark A. Hershberger
On 11/13/2013 05:44 AM, Nathan Larson wrote:
> TL;DR: How can we collaboratively put together a list of non-spammy sites
> that wikis may want to add to their interwiki tables for whitelisting
> purposes; and how can we arrange for the list to be efficiently distributed
> and imported?

I like the idea.  Unless I'm mistaken, it seems like most of this idea
could be implemented and improved on as an extension.

While the use of Meta has the advantage of a large number of possible
reviewers, I wonder if it might get better review elsewhere.

Alternatively, we could work with WikiApiary to tag spammy wikis that
his bot finds.

Also, since we're talking about spam and MediaWiki, another good site to
check out would be http://spamwiki.org/mediawiki/.

Mark.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bugzilla Weekly Report

2013-11-13 Thread Željko Filipin
On Fri, Oct 25, 2013 at 8:07 PM, Quim Gil  wrote:

> See "Top closers" and "Top openers" of all-time, last year and last month
> at http://korma.wmflabs.org/browser/top.html
> PS: yes, this information should appear at http://korma.wmflabs.org/
> browser/its.html - I will file a report.
>

Did you create the bug? I could not find it in Bugzilla.

Željko
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Re-implementing PDF support

2013-11-13 Thread Brad Jorsch (Anomie)
On Wed, Nov 13, 2013 at 11:10 AM, Tyler Romeo  wrote:
> On Wed, Nov 13, 2013 at 12:45 AM, Erik Moeller  wrote:
> I'm
> guessing you've found some library that automatically "prints" HTML5, which
> would make sense since browsers do that already, but I'm just curious.

Yes, phantomjs, as mentioned in the original message.

To be more specific, phantomjs is basically WebKit without a GUI, so
the output would be roughly equivalent to opening the page in Chrome
or Safari and printing to a PDF. Future plans include using bookjs or
the like to improve the rendering.


-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Re-implementing PDF support

2013-11-13 Thread Emmanuel Engelhart
Le 13/11/2013 17:10, Tyler Romeo a écrit :
> On Wed, Nov 13, 2013 at 12:45 AM, Erik Moeller  wrote:
> 
>> Most likely, we'll end up using Parsoid's HTML5 output, transform it
>> to add required bits like licensing info and prettify it, and then
>> render it to PDF via phantomjs, but we're still looking at various
>> rendering options.
>>
> 
> I don't have anything against this, but what's the reasoning? You now have
> to parse the wikitext into HTML5 and then parse the HTML5 into PDF. I'm
> guessing you've found some library that automatically "prints" HTML5, which
> would make sense since browsers do that already, but I'm just curious.

Here is an example about how this works:
https://github.com/ariya/phantomjs/blob/master/examples/rasterize.js

Emmanuel
-- 
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Re-implementing PDF support

2013-11-13 Thread Brad Jorsch (Anomie)
Note these are my own thoughts and not anything representative of the team.

On Wed, Nov 13, 2013 at 6:55 AM, Strainu  wrote:
> b. If the robots should _not_ be credited, how do we detect them?
> Ideally, there should be an automatical way to do so, but according to
> http://www.mediawiki.org/wiki/Bots, it only works for recent changes.
> Less ideally, only users with "bot" at the end should be removed, in
> order to keep users like
> https://ro.wikipedia.org/wiki/Utilizator:Vitalie_Ciubotaru (which is
> not a robot, but has "bot" in the name) in the contributor list.

Another way to exclude (most) bots would be to skip any user with the
"bot" user right. Note though that this would still include edits by
unflagged bots, or by bots that have since been decommissioned and the
bot flag removed.

Personally, though, I do agree that excluding any user with "bot" in
the name (or even with a name ending in "bot") is a bad idea even if
just applied to enwiki, and worse when applied to other wikis that may
have different naming conventions.

> . The idea is to decide if and how to credit:
> a. vandals
> b. reverters
> c. contributors which had their valid contributions rephrased or
> replaced from the article.
> d. contributors with valid contributions but invalid names

The hard part there is detecting these, particularly case (c). And
even then, the article may still be based on the original work in a
copyright sense even if no single word of the original edit remains.

Then there's also the situation where A makes an edit that is
partially useful and partially bad, B reverts, then C comes along and
incorporates parts of C's edit.


-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Re-implementing PDF support

2013-11-13 Thread Tyler Romeo
On Wed, Nov 13, 2013 at 12:45 AM, Erik Moeller  wrote:

> Most likely, we'll end up using Parsoid's HTML5 output, transform it
> to add required bits like licensing info and prettify it, and then
> render it to PDF via phantomjs, but we're still looking at various
> rendering options.
>

I don't have anything against this, but what's the reasoning? You now have
to parse the wikitext into HTML5 and then parse the HTML5 into PDF. I'm
guessing you've found some library that automatically "prints" HTML5, which
would make sense since browsers do that already, but I'm just curious.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fate of hierarchical list on Special:Allpages

2013-11-13 Thread Dan Garry
It's the latter; the hierarchical list will not be displayed if there are
over 50,000 pages.

See the relevant bug 
and the relevant patch  for more
information.

Dan


On 13 November 2013 12:24, Strainu  wrote:

> Hold on, are we talking about removing Special:Allpages altogether or
> just disabling the hierarchical view? If it's the former, It's not OK.
> Even for large wikis, it allows you to see near-by pages, such as
> pages with different diacritic signs, but which denote the same
> subject.
>
>
> Strainu
>
> 2013/11/13 Dan Garry :
> > Tim made the point that for a wiki with a very small set of pages (say,
> 100
> > pages) pages, Special:Allpages can be useful as it lets you get an
> overview
> > of the content of the wiki. That's obviously not too applicable to wikis
> > like the English Wikipedia, especially if it's creating a lot of
> > performance issues.
> >
> > Dan
> >
> >
> > On 13 November 2013 11:02, Bartosz Dziewoński 
> wrote:
> >
> >> On Wed, 13 Nov 2013 01:36:02 +0100, Ori Livneh 
> wrote:
> >>
> >>  From my perspective, the ideal outcome of this discussion would be
> >>> that we agree that the hierarchical list is a poor fit for the
> >>> MediaWiki of today, and we resolve to remove it from core.
> >>>
> >>
> >> +1. I've never understood what purpose the Special:Allpages layout was
> >> supposed to serve.
> >>
> >> --
> >> Matma Rex
> >>
> >>
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> >
> >
> >
> > --
> > Dan Garry
> > Associate Product Manager for Platform
> > Wikimedia Foundation
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Dan Garry
Associate Product Manager for Platform
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fate of hierarchical list on Special:Allpages

2013-11-13 Thread Strainu
Hold on, are we talking about removing Special:Allpages altogether or
just disabling the hierarchical view? If it's the former, It's not OK.
Even for large wikis, it allows you to see near-by pages, such as
pages with different diacritic signs, but which denote the same
subject.


Strainu

2013/11/13 Dan Garry :
> Tim made the point that for a wiki with a very small set of pages (say, 100
> pages) pages, Special:Allpages can be useful as it lets you get an overview
> of the content of the wiki. That's obviously not too applicable to wikis
> like the English Wikipedia, especially if it's creating a lot of
> performance issues.
>
> Dan
>
>
> On 13 November 2013 11:02, Bartosz Dziewoński  wrote:
>
>> On Wed, 13 Nov 2013 01:36:02 +0100, Ori Livneh  wrote:
>>
>>  From my perspective, the ideal outcome of this discussion would be
>>> that we agree that the hierarchical list is a poor fit for the
>>> MediaWiki of today, and we resolve to remove it from core.
>>>
>>
>> +1. I've never understood what purpose the Special:Allpages layout was
>> supposed to serve.
>>
>> --
>> Matma Rex
>>
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
>
> --
> Dan Garry
> Associate Product Manager for Platform
> Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fate of hierarchical list on Special:Allpages

2013-11-13 Thread Federico Leva (Nemo)
What's different in the "MediaWiki of today" as regards allpages? I 
suspect it was designed to resemble encyclopedias and dictionaries' 
browsing by "volumes". For those who don't remember it, here is a copy: 

In principle, it's a useful feature to see at a glance what a wiki 
contains, skipping – say – the few tens of thousands of similarly named 
asteroids or proteins, especially if the wiki is lazy and no proper 
category tree is maintained. In general, if you want to get it removed 
you should probably discuss it on mediawiki-l, given that they 
performance issue affecting Wikimedia wikis seems resolved (thanks!).


As regards pageviews, they're not particularly useful because you won't 
find the visits to the filters by namespace or whatever (which 
personally I often enter directly in the location bar without using the 
form).


Nemo

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Re-implementing PDF support

2013-11-13 Thread Strainu
Hi,

I'm grabbing this opportunity to bring up 3 bugs related to mwlib that
deserve a larger discussion and should perhaps be implemented
differently in the new version.

1. https://bugzilla.wikimedia.org/show_bug.cgi?id=56560 - PDF creation
tool considers IPv6 addresses as users, not anonymous.

I've pushed a patched for this and it was merged; however, the
detection was based on regex and, as a quick google search will tell
you, it's not so obvious to do a regex to cover all IPv6 cases.
Perhaps the information anon user/logged in user might be sent from
MW.

2. https://bugzilla.wikimedia.org/show_bug.cgi?id=56219 - PDF creation
tool excludes contributors with a "bot" substring in their username

I've also pushed a pull request for this one, but it was rejected
based on the en.wp policy that prevents bot-like usernames for humans.
The problem is more complex though:

a. Should bots be credited for their edits? While most of them do
simple tasks, we have recently seen an increase in bot-created
content. On ro.wp we even have a few lists only edited by robots.
b. If the robots should _not_ be credited, how do we detect them?
Ideally, there should be an automatical way to do so, but according to
http://www.mediawiki.org/wiki/Bots, it only works for recent changes.
Less ideally, only users with "bot" at the end should be removed, in
order to keep users like
https://ro.wikipedia.org/wiki/Utilizator:Vitalie_Ciubotaru (which is
not a robot, but has "bot" in the name) in the contributor list.


3. https://bugzilla.wikimedia.org/show_bug.cgi?id=2994 - Automatically
generated count and list of contributors to an article (authorship
tracking)

This is an old enhancement request, revived by me last month in a
wikimedia-l thread:
http://lists.wikimedia.org/pipermail/wikimedia-l/2013-October/128575.html
. The idea is to decide if and how to credit:
a. vandals
b. reverters
c. contributors which had their valid contributions rephrased or
replaced from the article.
d. contributors with valid contributions but invalid names

I hope the people working on this feature will take the time to
consider these issues and come up with solutions for them.

Thanks,
   Strainu


2013/11/13 Erik Moeller :
> Hi folks,
>
> for a long time we've relied on the mwlib libraries by PediaPress to
> generate PDFs on Wikimedia sites. These have served us well (we
> generate >200K PDFs/day), but they architecturally pre-date a lot of
> important developments in MediaWiki, and actually re-implement the
> MediaWiki parser (!) in Python. The occasion of moving the entire PDF
> service to a new data-center has given us reason to re-think the
> architecture and come up with a minimally viable alternative that we
> can support long term.
>
> Most likely, we'll end up using Parsoid's HTML5 output, transform it
> to add required bits like licensing info and prettify it, and then
> render it to PDF via phantomjs, but we're still looking at various
> rendering options.
>
> Thanks to Matt Walker, C. Scott Ananian, Max Semenik, Brad Jorsch and
> Jeff Green for joining the effort, and thanks to the PediaPress folks
> for giving background as needed. Ideally we'd like to continue to
> support printed book generation via PediaPress' web service, while
> completely replacing the rendering tech stack on the WMF side of
> things (still using the Collection extension to manage books). We may
> need to deprecate some output formats - more on that as we go.
>
> We've got the collection-alt-renderer project set up on Labs (thanks
> Andrew) and can hopefully get a plan to our ops team soon as to how
> the new setup could work.
>
> If you want to peek - work channel is #mediawiki-pdfhack on FreeNode.
>
> Live notes here:
> http://etherpad.wikimedia.org/p/pdfhack
>
> Stuff will be consolidated here:
> https://www.mediawiki.org/wiki/PDF_rendering
>
> Some early experiments with different rendering strategies here:
> https://github.com/cscott/pdf-research
>
> Some improvements to Collection extension underway:
> https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/Collection,n,z
>
> More soon,
> Erik
>
> --
> Erik Möller
> VP of Engineering and Product Development, Wikimedia Foundation
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fate of hierarchical list on Special:Allpages

2013-11-13 Thread Dan Garry
Tim made the point that for a wiki with a very small set of pages (say, 100
pages) pages, Special:Allpages can be useful as it lets you get an overview
of the content of the wiki. That's obviously not too applicable to wikis
like the English Wikipedia, especially if it's creating a lot of
performance issues.

Dan


On 13 November 2013 11:02, Bartosz Dziewoński  wrote:

> On Wed, 13 Nov 2013 01:36:02 +0100, Ori Livneh  wrote:
>
>  From my perspective, the ideal outcome of this discussion would be
>> that we agree that the hierarchical list is a poor fit for the
>> MediaWiki of today, and we resolve to remove it from core.
>>
>
> +1. I've never understood what purpose the Special:Allpages layout was
> supposed to serve.
>
> --
> Matma Rex
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Dan Garry
Associate Product Manager for Platform
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fate of hierarchical list on Special:Allpages

2013-11-13 Thread Bartosz Dziewoński

On Wed, 13 Nov 2013 01:36:02 +0100, Ori Livneh  wrote:


From my perspective, the ideal outcome of this discussion would be
that we agree that the hierarchical list is a poor fit for the
MediaWiki of today, and we resolve to remove it from core.


+1. I've never understood what purpose the Special:Allpages layout was supposed 
to serve.

--
Matma Rex

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] The interwiki table as a whitelist of non-spammy sites

2013-11-13 Thread Nathan Larson
TL;DR: How can we collaboratively put together a list of non-spammy sites
that wikis may want to add to their interwiki tables for whitelisting
purposes; and how can we arrange for the list to be efficiently distributed
and imported?

Nemo bis points out that
Interwikiis "the
easiest way to manage whitelisting" since nofollow isn't applied to
interwiki links. Should we encourage, then, wikis to make more use of
interwiki links? Usually, MediaWiki installations are configured so that
only sysops can add, remove, or modify interwiki prefixes and URLs. If a
user wants to link to another wiki, but it's not on the list, often he will
just use an external link rather than asking a sysop to add the prefix
(since it's a hassle for both parties and often people don't want to bother
the sysops too much in case they might need their help with something else
later). This defeats much of the point of having the interwiki table
available as a potential whitelist, unless the sysops are pretty on top of
their game when it comes to figuring out what new prefixes should be added.
In most cases, they probably aren't; the experience of Nupedia shows that
elitist, top-down systems tend not to work as well as egalitarian,
bottom-up systems.

Currently, 
interwiki.sqlhas
100 wikis, and there doesn't seem to be much rhyme or reason to which
ones are included (e.g. Seattlewiki?) I wrote
InterwikiMap,
which dumps the contents of the wiki's interwiki table into a backup page
and substitutes in its place the interwiki table of some other wiki (e.g. I
usually use Wikimedia's), with such modifications as the sysops see fit to
make. The extension lets sysops add, remove and modify interwiki prefixes
and URLs in bulk rather than one by one through Special:Interwiki, which is
a pretty tedious endeavor. Unfortunately, as written it is not a very
scalable solution, in that it can't accommodate very many thousand wiki
prefixes before the backup wikitables it generates exceed the capacity of
wiki pages, or it breaks for other reasons.

I was thinking of developing a tool that WikiIndex (or some other wiki
about wikis) could use to manage its own interwiki table via edits to
pages. Users would add interwiki prefixes to the table by adding a
parameter to a template that would in turn use a parser function that, upon
the saving of the page, would add the interwiki prefix to the table.
InterwikiMap could be modified to do incremental updates, polling the API
to find out what changes have recently been made to the interwiki table,
rather than getting the whole table each time. It would then be possible
for WikiIndex (or whatever other site were to be used) to be the
wikisphere's central repository of canconical interwiki prefixes. See
http://wikiindex.org/index.php?title=User_talk%3AMarkDilley&diff=172654&oldid=172575#Canonical_interwiki_prefixes.2C_II

But there's been some question as to whether there would be much demand for
a 200,000-prefix interwiki table, or whether it would be desirable. It
could also provide an incentive for spammers to try to add their sites to
WikiIndex. See
https://www.mediawiki.org/wiki/Talk:Canonical_interwiki_prefixes

It's hard to get stuff added to meta-wiki's interwiki
mapbecause one of the
criteria is that the prefix has to be one that would be
used a lot on Wikimedia sites. How can we put together a list of non-spammy
sites that wikis would be likely to want to have as prefixes for nofollow
whitelisting purposes, and distribute that list efficiently? I notice that
people are more likely to put together lists of spammy than non-spammy
sites; see e.g. Freakipedia's
list.
(Hmm, I think I'll pimp my websites to that wiki when I get a chance; the
fact that the spam isn't just removed but put on permanent record in a
public denunciation means it's a potential opportunity to gain exposure for
my content. They say there's no such thing as bad publicity. ;) )

-- 
Nathan Larson 
Distribution of my contributions to this email is hereby authorized
pursuant to the CC0 license
.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] [Reminder] Language Engineering IRC Office Hour on November 13, 2013 at 1500 UTC

2013-11-13 Thread Runa Bhattacharjee
Hello,

A quick reminder that the Wikimedia Language Engineering team will be
hosting an IRC office hour from 1500 to 1600UTC later today on
#wikimedia-office (FreeNode). Please see below for the event details.

Thanks
Runa

-- Forwarded message --
From: Runa Bhattacharjee 
Date: Thu, Nov 7, 2013 at 11:40 AM
Subject: Language Engineering IRC Office Hour on November 13, 2013 at 1500
UTC
To: MediaWiki internationalisation ,
Wikimedia Mailing List , Wikimedia
developers ,
wikitech-ambassad...@lists.wikimedia.org


[x-posted]

Hello,

The Wikimedia Language Engineering team will be hosting an IRC office
hour on Wednesday, November 13, 2013 between 15:00 - 16:00 UTC on
#wikimedia-office. (See below for timezone conversion and other details.)
We will be talking about some of our recent and upcoming projects and then
taking questions for the remaining time.

We also look forward to hear about anything that needs our attention.
Questions and other concerns can also be sent to me directly before the
event. See you there!

Thanks
Runa

=== Event Details ===

What: WMF Language Engineering Office hour
When: November 13, 2013 (Wednesday). 1500-1600 UTC
http://www.timeanddate.com/worldclock/fixedtime.html?iso=20131113T1500
Where: IRC Channel #wikimedia-office on FreeNode





-- 
Language Engineering - Outreach and QA Coordinator
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l