[Wikitech-l] Proposed Program Architecture Summit 2014

2014-01-16 Thread Diederik van Liere
Heya,

The Program Committee for the Architecture Summit has published a proposed
program: https://www.mediawiki.org/wiki/Architecture_Summit_2014

Highlights of the Program:

1) We have tried to incorporate flexibility into the program by allowing 4
unconference break-out slots, 1 open plenary session and a daily ‘agenda
bashing’ session to make adjustments to the program if the need arises.

2) There are 3 plenary sessions: HTML Templating (
https://www.mediawiki.org/wiki/Architecture_Summit_2014/HTML_templating),
Service Oriented Architecture (
https://www.mediawiki.org/wiki/Architecture_Summit_2014/Service-oriented_architecture)
and one open slot. Possible candidates for the open session include
Performance and UI styling but this will be decided during the Summit.  The
short list will include the higher vote-getters in the straw poll, so if
there’s one of the clusters you strongly feel should be part of the
program, now is your time to make that case.

3) There are 6 breakout sessions:

* 2 planned: UI styling (
https://www.mediawiki.org/wiki/Architecture_Summit_2014/UI_styling) and
Storage Services (
https://www.mediawiki.org/wiki/Architecture_Summit_2014/Storage_services)

* 4 unconference slots

There will be a round of Lightning Talks at the beginning of the plenary
session after the breakout session to summarize what happened by answering
the following questions:

a) What did you try to achieve?

b) What did you decide?

c) What are the next steps?

4) Architecture Panel and Value discussion. This is a plenary session for
the architects to share what they value in good architecture, as well as
talk about how they see the architecture of MediaWiki evolving, and what
role people other than our historical core group of three have to play in
the process.  During this session, we hope to at least answer some of the
questions outlined in the recent discussion of the RFC process[1]

5) RFC roulette: a one hour closing session for RFC’s that have not been in
the spotlight during the Summit and where the next step can be decided.
This is intended to be fast paced and slightly chaotic. If you would like
to hear what’s next for your RFC please participate with the roulette by
adding your name to
https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_roulette

What does the Program Committee expect from summit participants?

a) If you are a participant, please familiarize yourself with the latest
version of the RFC’s that you care about.

b) If you are an author of an RFC that is scheduled in a plenary session,
please start preparing in collaboration with the other authors from the
same session on a short slidedeck that summarizes all the different RFC’s.
One slide that could be really useful is to have a matrix that highlights
key differences between alternative / competing proposals. Diederik will
contact the folks who are invited for the plenary session to help
coordinate and organize.

c) If you are an author of an RFC that is scheduled in a breakout session,
please create maximum 3 slides that summarize your RFC and think about what
you want to get out of your session.  The slides are optional, but the
requisite level of preparation is not.

d) If you want to run an unconference slot then start thinking about a
theme and possible co-organizers. It’s okay to use an existing RFC cluster.

Two quick final notes:

The Program has been created using the input from the straw poll (
https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll), input
from the Program Committee and input from the Engineering Community Team.

To see which RFC’s compose a cluster please have a look at
https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_clusters

Looking forward to your feedback!



Best regards,

The Program Committee


[1]  Discussion of the RFC process:
https://www.mediawiki.org/wiki/Talk:Requests_for_comment/Process#
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Proposed Program Architecture Summit 2014

2014-01-16 Thread Diederik van Liere
On Thu, Jan 16, 2014 at 10:11 PM, legoktm legoktm.wikipe...@gmail.comwrote:

 Hi,
 Given that the Configuration cluster had the second most number of
 votes in the poll, why was it left of the agenda entirely?

We have been going back and forth between plenary session and breakout
session for Configuration, it's definitely on the table and I am 99.9% sure
that we will have a session dedicated to it. It's just not officially
slotted anywhere right now but I will check with Robla and see how we will
schedule this important cluster.
It not being in the program right now should not be seen as a sign that's
not important, on the contrary, we are just trying to find the most
appropriate slot for it.
D


 On Thu, Jan 16, 2014 at 10:47 AM, Diederik van Liere
 dvanli...@wikimedia.org wrote:
  Heya,
 
  The Program Committee for the Architecture Summit has published a
 proposed
  program: https://www.mediawiki.org/wiki/Architecture_Summit_2014
 
  Highlights of the Program:
 
  1) We have tried to incorporate flexibility into the program by allowing
 4
  unconference break-out slots, 1 open plenary session and a daily ‘agenda
  bashing’ session to make adjustments to the program if the need arises.
 
  2) There are 3 plenary sessions: HTML Templating (
  https://www.mediawiki.org/wiki/Architecture_Summit_2014/HTML_templating
 ),
  Service Oriented Architecture (
 
 https://www.mediawiki.org/wiki/Architecture_Summit_2014/Service-oriented_architecture
 )
  and one open slot. Possible candidates for the open session include
  Performance and UI styling but this will be decided during the Summit.
  The
  short list will include the higher vote-getters in the straw poll, so if
  there’s one of the clusters you strongly feel should be part of the
  program, now is your time to make that case.

 Make that case where? Given that slidedecks are supposed to be
 prepared, shouldn't this be decided beforehand rather than waiting?

  3) There are 6 breakout sessions:
 
  * 2 planned: UI styling (
  https://www.mediawiki.org/wiki/Architecture_Summit_2014/UI_styling) and
  Storage Services (
  https://www.mediawiki.org/wiki/Architecture_Summit_2014/Storage_services
 )
 
  * 4 unconference slots
 
  There will be a round of Lightning Talks at the beginning of the plenary
  session after the breakout session to summarize what happened by
 answering
  the following questions:
 
  a) What did you try to achieve?
 
  b) What did you decide?
 
  c) What are the next steps?
 
  4) Architecture Panel and Value discussion. This is a plenary session for
  the architects to share what they value in good architecture, as well as
  talk about how they see the architecture of MediaWiki evolving, and what
  role people other than our historical core group of three have to play in
  the process.  During this session, we hope to at least answer some of the
  questions outlined in the recent discussion of the RFC process[1]
 
  5) RFC roulette: a one hour closing session for RFC’s that have not been
 in
  the spotlight during the Summit and where the next step can be decided.
  This is intended to be fast paced and slightly chaotic. If you would like
  to hear what’s next for your RFC please participate with the roulette by
  adding your name to
  https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_roulette
 
  What does the Program Committee expect from summit participants?
 
  a) If you are a participant, please familiarize yourself with the latest
  version of the RFC’s that you care about.
 
  b) If you are an author of an RFC that is scheduled in a plenary session,
  please start preparing in collaboration with the other authors from the
  same session on a short slidedeck that summarizes all the different
 RFC’s.
  One slide that could be really useful is to have a matrix that highlights
  key differences between alternative / competing proposals. Diederik will
  contact the folks who are invited for the plenary session to help
  coordinate and organize.
 
  c) If you are an author of an RFC that is scheduled in a breakout
 session,
  please create maximum 3 slides that summarize your RFC and think about
 what
  you want to get out of your session.  The slides are optional, but the
  requisite level of preparation is not.
 
  d) If you want to run an unconference slot then start thinking about a
  theme and possible co-organizers. It’s okay to use an existing RFC
 cluster.
 
  Two quick final notes:
 
  The Program has been created using the input from the straw poll (
  https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll),
 input
  from the Program Committee and input from the Engineering Community Team.
 
  To see which RFC’s compose a cluster please have a look at
  https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_clusters
 
  Looking forward to your feedback!
 
 
 
  Best regards,
 
  The Program Committee
 
 
  [1]  Discussion of the RFC process:
  https://www.mediawiki.org/wiki

[Wikitech-l] Final chance to vote in Architecture Summit straw poll

2014-01-08 Thread Diederik van Liere
Heya,

Today, January 8th until 11:59 PM PST, you can vote in the straw poll for
the Architecture Summit. Please cast your votes here:
https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll

Tomorrow I will start creating the program for the Summit and cannot
promise to include your votes anymore.

Thanks to all the folks who have voted so far.

D
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Straw poll Architecture Summit closing January 8th

2014-01-07 Thread Diederik van Liere
Heya,


If you haven't exercised your right to cast a vote in the Straw Poll for
the Architecture Summit then this would be a real good time to do so.
I would like to close the poll by January 8th so we can start making the
final program.

You can find the straw poll here:
https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll


Thanks for your help and please help me in driving the turnout - the more
votes the better!


D
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC on PHP profiling

2014-01-03 Thread Diederik van Liere
@Chad: should this be included in the straw poll for the architecture
summit or is that too soon?
D


On Tue, Dec 31, 2013 at 6:55 PM, Chad innocentkil...@gmail.com wrote:

 I'm starting a new RFC to discuss ways we can improve our PHP profiling.

 https://www.mediawiki.org/wiki/Requests_for_comment/Better_PHP_profiling

 Please feel free to help expand and/or comment on the talk page if you've
 got ideas :)

 -Chad
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Straw poll to determine program for Architecture Summit

2014-01-02 Thread Diederik van Liere
Hi everyone,

Best wishes for 2014! I hope on your list of resolutions for the New Year
is to participate in the straw poll for the Architecture Summit --
https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll

Quick refresher: we have created clusters of related RFC's (
https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_clusters) for
the upcoming Architecture Summit. We are pretty happy with the clustering
and so now we want to hear from you what you think are the most important
clusters that should definitely be included in the program.

The poll will close Thursday, January 8th.

If you have any questions then please let me know!

Best,

Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: Proposal for biweekly Labs showcase

2013-12-31 Thread Diederik van Liere
Bumping my proposal. I am particularly looking forward from responses from
community members.
Have an awesome New Year's Eve!

Best,
Diederik


Heya,


I just posted an initial proposal to start running a biweekly showcase to
feature all the cool things that are happening on Labs. Please have a look
at https://wikitech.wikimedia.org/wiki/Showcase, express your interest and
chime in on the Talk page to help get this off the ground.

Best,
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Clustering of RFC's for the Architecture Summit

2013-12-27 Thread Diederik van Liere
Heya,


We are making good progress with creating clusters to group RFC for the
upcoming Architectural Summit. Some clusters are still too big, in
particular the following clusters can/should be split in smaller clusters
of 3-4 RFC's each.

* General Mediawiki Functionality
* Backend code modularity frameworks
* Installation
* SOA
* UI/UX: styling


Please have a look at
https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_clusters and
help us finalize the clustering!

Thanks!

Best,
D
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Suggested process for determining topics at the Architecture Summit

2013-12-20 Thread Diederik van Liere
Hola,


We wanted to update you with our proposal on how we are thinking on how to
create a program for the Architecture Summit coming January. We started
grouping the RFC's from https://www.mediawiki.org/wiki/RFC into clusters at
https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_Clusters. This
clustering is neither perfect nor complete and we would like to ask your
help with finetuning the clustering.

The idea is that a cluster contains RFC's that belong to each other -- this
means if we discuss RFC A and therefore RFC B needs to be discussed as well
then those two RFC's should be in the same cluster. Clusters should also be
small, probably not more than 3 or 4 RFC's per cluster. Sometimes RFC's in
the same cluster will offer alternative suggested implementations,
sometimes, RFC's are closely related because they pursue a similar goal.

Currently, we have one big cluster called 'General Mediawiki Functionality'
and this list definitely needs to be broken up in smaller clusters. The
'Misc' cluster can probably also broken up in smaller clusters.

Once we have nailed down the clusters of RFC's then we will run a strawpoll
to gauge interest for the different clusters. The strawpoll will inform our
decision which clusters should be discussed at the Architecture Summit. We
want to launch this strawpoll at the latest on January 2nd, 2014.

Summary:
1) Help us finalize the clustering of RFC's on
https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_Clusters
2) Participate in the strawpoll once it goes live (probably January 2nd),
separate email will follow.


If you have any questions, thoughts, suggestions, remarks, etc, etc please
let us know!

Best,
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Architecture Summit -- Gathering all relevant RfC's

2013-12-11 Thread Diederik van Liere
On Wed, Nov 27, 2013 at 2:55 PM, Jon Robson jdlrob...@gmail.com wrote:

 One that I would like to discuss but still need to write up is
 JavaScript template support in ResourceLoader. Mobile has been using
 Hogan.js for some time and we would like to upstream this as a
 standard.

 I'll try and get this written in next 2 weeks but it would be good to
 capture this even in a stub like form (not sure if stubs are allowed
 on the RFC page)

Hey Jon,

If there's anything I can do to help you with this RfC then please let me
know.
Best,
Diederik


 On Tue, Nov 26, 2013 at 6:27 PM, Diederik van Liere
 dvanli...@wikimedia.org wrote:
  Heya,
 
  The Architecture Summit will be upon us in less than two months. To make
  sure that this Summit is going to be productive it is important that we
  discuss the right RfC's. Before deciding which RfC's should be discussed
 at
  the Summit I want to make sure that
  https://www.mediawiki.org/wiki/Requests_for_comment contains all RfC's
 and
  that all important topics have an RfC.
 
  If you have a Mediawiki related RfC in a personal notepad, on your User
  Page, in your mind then this would be a great moment to write or move it
  under https://www.mediawiki.org/wiki/Requests_for_comment and add an
 entry
  to the table. If you don't have 'move' rights then please let me know
 and I
  can move it for you.
 
  If you know of a topic that *should* have an RfC but does not yet have an
  RfC then please reply to this list mentioning the topic. I will check
 with
  Tim/Brion to see how these topics can get an RfC.
 
  Once we have collected all relevant RfC's under
  https://www.mediawiki.org/wiki/Requests_for_comment then I will make a
 page
  where everybody can express their interest in which RfC's should be
  discussed at the Summit.
 
  Questions? Let me know!
 
  Best,
  Diederik
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l



 --
 Jon Robson
 http://jonrobson.me.uk
 @rakugojon

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] [WikimediaMobile] Mobile stats

2013-09-04 Thread Diederik van Liere
Heya,
I would suggest to at least run it for a 7 day period so you capture at
least the weekly time-trends, increasing the sample size should also be
recommendable. We can help setup a udp-filter for this purpose as long as
the data can be extracted from the user-agent string.

D
On Wed, Sep 4, 2013 at 1:50 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 Thanks Max for digging into this :)

 I'm no analytics guy, but I am a little concerned about the sample size
 and duration of the internal logging that we've done - sampling 1/1 for
 only a few days for data about something we generally know usage to already
 be low seems to me like it might be difficult to get accurate numbers. Can
 someone from the analytics team chime in and let us know if the approach is
 sound and if we should trust the data Max has come up with? This has big
 implications as it will play role in determining whether or not we continue
 supporting WAP devices and providing WAP access to the sites.

 Thanks everyone!


 On Tue, Sep 3, 2013 at 10:40 AM, Erik Zachte ezac...@wikimedia.orgwrote:

 Sadly you need to take squid log based reports with a grain of salt.
 Several incomplete maintenance jobs have taken their toll.

 Each report starts with a long list of unsolved bugs.
 Among those https://bugzilla.wikimedia.org/show_bug.cgi?id=46273

 So yeah better trust your own data.

 Erik


 -Original Message-
 From: analytics-boun...@lists.wikimedia.org [mailto:
 analytics-boun...@lists.wikimedia.org] On Behalf Of Max Semenik
 Sent: Tuesday, September 03, 2013 5:33 PM
 To: analyt...@lists.wikimedia.org; Wikimedia developers; mobile-l
 Subject: [Analytics] Mobile stats

 Hi, I have a few questions regarding mobile stats.

 I need to determine a real percentage of WAP browsers. At first glance,
 [1] looks interesting: ratio of text/html to text/vnd.wap.wml is 92M /
 3987M = 2.3% on m.wikipedia.org. However, this contradicts the stats at
 [2] which have different numbers and a different ratio.

 I did my own research: because during browser detection in Varnish
 WAPness is detected mostly by looking at accept header and because our
 current analytics infrastructure doesn't log it, I quickly whipped up a
 code that recorded user-agent and accept of every 10,000th request for
 mobile page views hitting apaches.

 According to several days worth of data, out of 14917 logged requests
 1445 contained vnd.wap.wml in Accept: headers in any form. That's more
 than what is logged for frontend responses, however it is expected as WAP
 should have worse cache hit rate and thus should hit apaches more often.

 Next, our WAP detection code is very simple: user-agent is checked
 against a few major browser IDs (all of them are HTML-capable and this
 check is not actually needed anymore and will go away soon) and if still
 not known, we consider every device that sends Accept:
 header vnd.wap.wml (but not application/vnd.wap.xhtml+xml), to be
 WAP-only. If we apply these rules, we get only 68 entries that qualify as
 WAP which is 0.05% of all mobile requests.

 The question is, what's wrong: my research or stats.wikimedia.org?

 And if it's indeed just 0.05%, we should probably^W definitely kill WAP
 support on our mobile site as it's virtually unmaintained.

 -
 [1] http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm
 [2] http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm



 --
 Best regards,
   Max Semenik ([[User:MaxSem]])


 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


 ___
 Mobile-l mailing list
 mobil...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l




 --
 Arthur Richards
 Software Engineer, Mobile
 [[User:Awjrichards]]
 IRC: awjr
 +1-415-839-6885 x6687

 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] access log (pagecounts) dump stopped

2013-08-05 Thread Diederik van Liere
Hi Cheol!

Thanks for alerting us to this issue. We are looking into it right now.
Best,
Diederik


On Mon, Aug 5, 2013 at 4:24 PM, Ryu Cheol rch...@gmail.com wrote:

 Hello guys,

 http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-08/ is not
 updated for a few hours.
 I don't know who keeps this running. Would please you let him know?

 Cheers!
 Cheol


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] access log (pagecounts) dump stopped

2013-08-05 Thread Diederik van Liere
Hi Cheol,

The cronjob was broken due to some maintenance on the dumps server. The
cronjob is being fixed right now and no data has been lost. In a couple of
hours all files should be present again. If you still see an issue in 48
hours then please ping me.

Best,
Diederik


On Mon, Aug 5, 2013 at 5:09 PM, Diederik van Liere
dvanli...@wikimedia.orgwrote:

 Hi Cheol!

 Thanks for alerting us to this issue. We are looking into it right now.
 Best,
 Diederik


 On Mon, Aug 5, 2013 at 4:24 PM, Ryu Cheol rch...@gmail.com wrote:

 Hello guys,

 http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-08/ is not
 updated for a few hours.
 I don't know who keeps this running. Would please you let him know?

 Cheers!
 Cheol


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers

2013-03-20 Thread Diederik van Liere
This bug has been fixed, see
https://bugzilla.wikimedia.org/show_bug.cgi?id=45178

I will post a message on the Village Pump as well.

Best,
Diederik


On Sun, Feb 3, 2013 at 3:44 PM, Brad Jorsch bjor...@wikimedia.org wrote:

 On Fri, Jan 25, 2013 at 12:51 PM, Diederik van Liere
 dvanli...@wikimedia.org wrote:
  No, the output format of
 http://dumps.wikimedia.org/other/pagecounts-raw/
  will stay the same.

 It seems that page names are coming through with spaces now, where
 they didn't before. See

 https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Format_Change_of_Page_View_Stats

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wmfall] Yuri Astrakhan Adam Baso join Mobile department partner team

2013-03-18 Thread Diederik van Liere
Awesome news! Go team Mobile!
D


On Mon, Mar 18, 2013 at 1:59 PM, Rachel Farrand rfarr...@wikimedia.orgwrote:

 Welcome Adam and Yuri! Looking forward to working with both of you. :)
 Rachel

 On Mon, Mar 18, 2013 at 10:48 AM, Erik Moeller e...@wikimedia.org wrote:

 On Mon, Mar 18, 2013 at 10:29 AM, Tomasz Finc tf...@wikimedia.org
 wrote:

  I'm pleased to announce that the mobile department has two new staff
  members. Yuri Astrakhan  Adam Baso join as sr. software developers on
  the mobile partner team.

 Welcome on board, guys. Really looking forward to the next steps with
 WP Zero. :-)

 Erik


 --
 Erik Möller
 VP of Engineering and Product Development, Wikimedia Foundation

 Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate

 ___

 Wmfall mailing list
 wmf...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wmfall



 ___
 Wmfall mailing list
 wmf...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wmfall


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-15 Thread Diederik van Liere
Thanks Asher for tying this up! I was about to write a similar email :)
One final question, just to make sure we are all on the same page: is the
X-CS field becoming a generic key/value pair for tracking purposes?

D


On Fri, Feb 15, 2013 at 11:16 AM, Asher Feldman afeld...@wikimedia.orgwrote:

 Just to tie this thread up - the issue of how to count ajax driven
 pageviews loaded from the api and of how to differentiate those requests
 from secondary api page requests has been resolved without the need for
 code or logging changes.

 Tagging of the mobile beta site will be accomplished via a new generic
 mediawiki http response header dedicated to logging containing key value
 pairs.

 -Asher

 On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman afeld...@wikimedia.org
 wrote:

  On Tuesday, February 12, 2013, Diederik van Liere wrote:
 
   It does still seem to me that the data to determine secondary api
  requests
   should already be present in the existing log line. If the value of
 the
   page param in an action=mobileview api request matches the page in the
   referrer (perhaps with normalization), it's a secondary request as per
  case
   1 below.  Otherwise, it's a pageview as per case 2.  Difficult or
  expensive
   to reconcile?  Not when you're doing distributed log analysis via
  hadoop.
  
  So I did look into this prior to writing the RFC and the issue is that a
  lot of API referrers don't contain the querystring. I don't know what
  triggers this so if we can fix this then we can definitely derive the
  secondary pageview request from the referrer field.
  D
 
 
  If you can point me to some examples, I'll see if I can find any insights
  into the behavior.
 
 
 
   On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards 
  aricha...@wikimedia.org
   wrote:
  
Thanks, Jon. To try and clarify a bit more about the API requests...
  they
are not made on a per-section basis. As I mentioned earlier, there
 are
   two
cases in which article content gets loaded by the API:
   
1) Going directly to a page (eg clicking a link from a Google
 search)
   will
result in the backend serving a page with ONLY summary section
 content
   and
section headers. The rest of the page is lazily loaded via API
 request
   once
the JS for the page gets loaded. The idea is to increase
  responsiveness
   by
reducing the delay for an article to load (further details in the
  article
Jon previously linked to). The API request looks like:
   
   
  
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
   
2) Loading an article entirely via Javascript - like when a link is
   clicked
in an article to another article, or an article is loaded via
 search.
   This
will make ONE call to the API to load article content. API request
  looks
like:
   
   
  
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
   
These API requests are identical, but only #2 should be counted as a
'pageview' - #1 is a secondary API request and should not be counted
  as a
'pageview'. You could make the argument that we just count all of
  these
   API
requests as pageviews, but there are cases when we can't load
 article
content from the API (like devices that do not support JS), so we
  need to
be able to count the traditional page request as a pageview - thus
 we
   need
a way to differentiate the types of API requests being made when
 they
otherwise share the same URL.
   
   
   
On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com
  wrote:
   
 I'm a bit worried that now we are asking why pages are lazy loaded
 rather than focusing on the fact that they currently __are doing
 this___ and how we can log these (if we want to discuss this
 further
 let's start another thread as I'm getting extremely confused doing
  so
 on this one).

 Lazy loading sections
 
 For motivation behind moving MobileFrontend into the direction of
  lazy
 loading section content and subsequent pages can be found here
 [1],
  I
 just gave it a refresh as it was a little out of date.

 In summary the reason is to
 1) make the app feel more responsive by simply loading content
  rather
 than reloading the entire interface
 2) reducing the payload sent to a device.

 Session Tracking
 

 Going back to the discussion of tracking mobile page views, it
  sounds
 like a header stating whether a page is being viewed in alpha,
 beta
  or
 stable works fine for standard page views.

 As for the situations where an entire page is loaded via the api

Re: [Wikitech-l] Page view stats we can believe in

2013-02-13 Thread Diederik van Liere
Hi all,

Lars, Rupert thanks for flagging this and you are quite right: the numbers
are too high because webstatscollector, the software that does the counts,
just counts every request as a hit including bots, error pages etc.

I am planning on running a sprint at the Amsterdam Hackathon to built an
easy queryable datastore with clean pageview counts. Please let me know if
you are interested in this so I can pitch this.

Best,
Diederik


On Wed, Feb 13, 2013 at 3:36 PM, Lars Aronsson l...@aronsson.se wrote:

 On 02/14/2013 12:03 AM, rupert THURNER wrote:

 this means 569 pages accessed in this hour, at least once.


 Thanks for taking the time to do this check! This
 number already is unreasonable for an obscure project
 with 8000 articles.


  da.d Speciel:Eksporter/engelsk 2 7818


 Should Special:Export ever count as page views?
 Anyway, there are no humans using Special:Export
 on da.wiktionary in the middle of the night.


  this means that e.g. springer was supposedly accessed 3 times in
 that hour. the article does not exist, but there is a red link out of
 http://da.wiktionary.org/wiki/**Wiktionary:Top_1_(Dansk)http://da.wiktionary.org/wiki/Wiktionary:Top_1_(Dansk)
 .


 So are there some stupid bots that follow red links?
 There could be a large number of such accesses
 on Wiktionary (in any language) because there
 are so many red links. But bots should never be
 counted among the page views.



 --
   Lars Aronsson (l...@aronsson.se)
   Aronsson Datateknik - http://aronsson.se



 __**_
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-12 Thread Diederik van Liere
 It does still seem to me that the data to determine secondary api requests
 should already be present in the existing log line. If the value of the
 page param in an action=mobileview api request matches the page in the
 referrer (perhaps with normalization), it's a secondary request as per case
 1 below.  Otherwise, it's a pageview as per case 2.  Difficult or expensive
 to reconcile?  Not when you're doing distributed log analysis via hadoop.

So I did look into this prior to writing the RFC and the issue is that a
lot of API referrers don't contain the querystring. I don't know what
triggers this so if we can fix this then we can definitely derive the
secondary pageview request from the referrer field.
D



 On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards aricha...@wikimedia.org
 wrote:

  Thanks, Jon. To try and clarify a bit more about the API requests... they
  are not made on a per-section basis. As I mentioned earlier, there are
 two
  cases in which article content gets loaded by the API:
 
  1) Going directly to a page (eg clicking a link from a Google search)
 will
  result in the backend serving a page with ONLY summary section content
 and
  section headers. The rest of the page is lazily loaded via API request
 once
  the JS for the page gets loaded. The idea is to increase responsiveness
 by
  reducing the delay for an article to load (further details in the article
  Jon previously linked to). The API request looks like:
 
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
 
  2) Loading an article entirely via Javascript - like when a link is
 clicked
  in an article to another article, or an article is loaded via search.
 This
  will make ONE call to the API to load article content. API request looks
  like:
 
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
 
  These API requests are identical, but only #2 should be counted as a
  'pageview' - #1 is a secondary API request and should not be counted as a
  'pageview'. You could make the argument that we just count all of these
 API
  requests as pageviews, but there are cases when we can't load article
  content from the API (like devices that do not support JS), so we need to
  be able to count the traditional page request as a pageview - thus we
 need
  a way to differentiate the types of API requests being made when they
  otherwise share the same URL.
 
 
 
  On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote:
 
   I'm a bit worried that now we are asking why pages are lazy loaded
   rather than focusing on the fact that they currently __are doing
   this___ and how we can log these (if we want to discuss this further
   let's start another thread as I'm getting extremely confused doing so
   on this one).
  
   Lazy loading sections
   
   For motivation behind moving MobileFrontend into the direction of lazy
   loading section content and subsequent pages can be found here [1], I
   just gave it a refresh as it was a little out of date.
  
   In summary the reason is to
   1) make the app feel more responsive by simply loading content rather
   than reloading the entire interface
   2) reducing the payload sent to a device.
  
   Session Tracking
   
  
   Going back to the discussion of tracking mobile page views, it sounds
   like a header stating whether a page is being viewed in alpha, beta or
   stable works fine for standard page views.
  
   As for the situations where an entire page is loaded via the api it
   makes no difference to us to whether we
   1) send the same header (set via javascript) or
   2) add a query string parameter.
  
   The only advantage I can see of using a header is that an initial page
   load of the article San Francisco currently uses the same api url as a
   page load of the article San Francisco via javascript (e.g. I click a
   link to 'San Francisco' on the California article).
  
   In this new method they would use different urls (as the data sent is
   different). I'm not sure how that would effect caching.
  
   Let us know which method is preferred. From my perspective
   implementation of either is easy.
  
   [1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections
  
   On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman 
 afeld...@wikimedia.org
   wrote:
Max - good answers re: caching concerns.  That leaves studying if the
   bytes
transferred on average mobile article view increases or decreases
 with
   lazy
section loading.  If it increases, I'd say this isn't a positive
   direction
to go in and stop there.  If it decreases, then we should look at the
effect on total latency, number of 

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-05 Thread Diederik van Liere
 Analytics folks, is this workable from your perspective?

 Yes, this works fine for us and it's also no problem to set multiple
key/value pairs in the http header that we are now using for the X-CS
header.
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-02 Thread Diederik van Liere
Thanks Ori, I was not aware of this
D

Sent from my iPhone

On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote:

 
 
 On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
 
 I don't like it's cryptic nature.
 
 Someone looking at the headers sent to his browser would be very
 confused about what's the point of «X-MF-Mode: b».
 
 Instead something like this would be much more descriptive:
 X-Mobile-Mode: stable
 X-Mobile-Request: secondary
 
 But that also means sending more bytes through the wire :S
 Well, you can (and should) drop the 'X-' :-)
 
 See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix and 
 Similar Constructs in Application Protocols
 
 
 --
 Ori Livneh
 
 
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews

2013-01-31 Thread Diederik van Liere
(Apologies for cross-posting)


Heya,

The mobile team needs accurate pageviews for the alpha and beta mobile
site. Currently, this information is only stored in a cookie, but we don't
want to go the route of starting to store this cookie because of cache
server performance, network performance and privacy policy issues. The
mobile team also needs to be able to diferentiate between initial and
secondary API requests - pages in the beta version of MobileFrontend are
dynamically loaded via the API, meaning that MobileFrontend will might make
multiple API requests to load sections of an article when they are toggled
open up by the user. At the moment, we have no way of diferentiating
between API requests to determine which one should count as a 'pageview'.

We propose that we set two additional custom HTTP headers - one to identify
alpha/beta/stable version of MobileFrontend, the other to be able to
diferentiate between initial and secondary API requests. This would make
logging the necessary information trivial, and we believe it would be
fairly lightweight to implement.

We propose the following two headers with their possible values:
X-MF-Mode: a/b/s (alpha/beta/stable)
X-MF-Req: 1/2 (primary/secondary)

X-MF-Mode would be determined by Varnish based off the existence of the
alpha/beta identifying cookies while X-MF-Req would be set by
MobileFrontend in the backend response.

These headers would only be set on the Varnish servers, on the Squids/Nginx
we will just set a dash ('-') in the log fields.

Questions:
1) Are there objections to the introduction of these two http headers?
2) We would like to aim for a late February deployment, is that an okay
period? (We will announce the real deployment date as well)
3) Are we missing anything important?

Thanks for your feedback!

Best
Arthur  Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] RFC: Tab as field delimiter in logging format of cache servers

2013-01-31 Thread Diederik van Liere
Yes let's not change the filenames
D

Sent from my iPhone

On 2013-01-31, at 18:45, Matthew Walker mwal...@wikimedia.org wrote:

 We will most likely change the file names back to their original names in a 
 month or so
 
 Please don't. It'll serve as a visible marker for the future for when we go 
 back and look at the files and do a WTF.
 
 ~Matt Walker
 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Nexus Maven repo

2013-01-28 Thread Diederik van Liere
heya,



for all you Java junkies out there, oh wait there are very few within WMF
:) Anyways, if you do Java you can now use the Nexus Maven repo that is
installed on Labs at http://nexus.wmflabs.org/nexus/index.html#welcome

We are  happy to give you an account, please poke us on IRC @
wikimedia-analytics or email David Schoonover or me.


Best,
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers

2013-01-25 Thread Diederik van Liere
Apologies for crossposting

Heya,

The Analytics Team is planning to deploy tab as field delimiter to
replace the current space as fielddelimiter on the varnish/squid/nginx
servers. We would like to do this on February 1st. The reason for this
change is that we need to have a consistent number of fields in each
webrequest log line. Right now, some fields contain spaces and that require
a lot of post-processing cleanup and slows down the generation of reports.

What is affected and maintained by Analytics

* udp-filter already has support for the tab character
* webstatscollector: we compiled a new version of filter to add support for
the tab character
* wikistats: we will fix the scripts on an ongoing basis.
* udp2log: we have a patch ready for inserting sequence numbers separated
by tab.

In particular, I would like to have feedback to three questions:

1) Are there important reasons not to use tab as field delimiter?

2) Are there important pieces of logging that expect a space instead of a
tab and that need to be fixed and that I did not mention in this email?

3) Is February 1st a good date to deploy this change? (Assuming that all
preps are finished)


Best,

Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers

2013-01-25 Thread Diederik van Liere
No, the output format of http://dumps.wikimedia.org/other/pagecounts-raw/
will stay the same.
Best,
Diederik


On Fri, Jan 25, 2013 at 12:48 PM, bawolff bawolff...@gmail.com wrote:

 Just to clarify, will this affect the stats at
 http://dumps.wikimedia.org/other/pagecounts-raw/ ? Changing the format
 of that will probably break third party scripts.
 --
 -bawolff


 On Fri, Jan 25, 2013 at 1:41 PM, Diederik van Liere
 dvanli...@wikimedia.org wrote:
  Apologies for crossposting
 
  Heya,
 
  The Analytics Team is planning to deploy tab as field delimiter to
  replace the current space as fielddelimiter on the varnish/squid/nginx
  servers. We would like to do this on February 1st. The reason for this
  change is that we need to have a consistent number of fields in each
  webrequest log line. Right now, some fields contain spaces and that
 require
  a lot of post-processing cleanup and slows down the generation of
 reports.
 
  What is affected and maintained by Analytics
 
  * udp-filter already has support for the tab character
  * webstatscollector: we compiled a new version of filter to add support
 for
  the tab character
  * wikistats: we will fix the scripts on an ongoing basis.
  * udp2log: we have a patch ready for inserting sequence numbers separated
  by tab.
 
  In particular, I would like to have feedback to three questions:
 
  1) Are there important reasons not to use tab as field delimiter?
 
  2) Are there important pieces of logging that expect a space instead of a
  tab and that need to be fixed and that I did not mention in this email?
 
  3) Is February 1st a good date to deploy this change? (Assuming that all
  preps are finished)
 
 
  Best,
 
  Diederik
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Research on newcomer experience - do we want to take part?

2012-11-19 Thread Diederik van Liere
Hey Quim

I also sent you this survey a week ago with the question whether we should
participate :)
D


On Fri, Nov 16, 2012 at 5:13 PM, Quim Gil q...@wikimedia.org wrote:

 Hi, sorry for cross-replying.

 On Wed, Nov 14, 2012 at 3:11 PM, Lydia Pintscher
 lydia.pintsc...@wikimedia.de wrote:
  On Wed, Nov 14, 2012 at 11:00 PM, Marcin Cieslak sa...@saper.info
 wrote:
  Hello,
 
  Kevin Carillo[1] from University of Wellington is going to research
  Newcomer experience and contributor behavior in FOSS communities[2]
  So far Debian, GNOME, Gentoo, KDE, Mozilla, Ubuntu, NetBSD, OpenSUSE
  will be taken into account, and FreeBSD recently joined[3] and
  there is still some possibility for other large FOSS projects to join.
 
  I think it could fit nicely into our recent efforts directed
  at newcomer experience after Git migration. And MediaWiki is
  a bit different than above projects.
 
  Are we interested
  to include MediaWiki in that research?
 
  As Kevin explains in his post he tried to avoid spamming mailing
  lists to look for project interested, so I am doing this for him :-)
 
  //Saper
 
  I've worked with Kevin in preparation for his survey and later
  promotion from the KDE-side quite a bit. This is not the kind of
  research project that is of no value to the project taking part. I
  expect the results to be very useful for KDE (and likely also the
  other projects taking part).

 It turns out that Sumana and me have been in touch with Kevin in the
 past days after Asheesh Laroia proposed directly to include Wikimedia
 in this research.

 Said and done, Wikimedia is also included in the survey and you are
 encouraged to invest some minutes in it:

 https://limesurvey.sim.vuw.ac.nz/index.php?sid=65151

 I will send a proper announcement next Monday, but in the meantime
 here is an illustrative link of links:

 http://kevincarillo.org/2012/11/15/survey-update-after-1-week/

 --
 Quim

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] editing channels - How was this edit made?

2012-11-14 Thread Diederik van Liere

On 2012-11-14, at 18:33, Platonides platoni...@gmail.com wrote:

 On 13/11/12 23:42, MZMcBride wrote:
 Please stop top-posting. If you don't understand what that means, please
 read https://wiki.toolserver.org/view/Mailing_list_etiquette.
 
 As I posted at https://www.mediawiki.org/wiki/Talk:Revtagging, it's not
 clear to me why the built-in revision tagging system in MediaWiki is
 insufficient for your needs. It _feels_ like wheel-reinvention, but perhaps
 there's some key component I'm missing.
 
 It should indeed be enough to use change_tag.
 
 Also note that some parameters listed in the page are redundant for some
 campaigns (such as adding the bot name).


I think that the Analytics team would prefer either:
1) detect source of edit in the URL
Or 
2) have a hook activated after a successful edit and have the data send to the 
pixel service 

Having this data in a MySQL table poses a lot of challenges with respect of 
importing that data into the analytics cluster

Best
Diederik 
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] editing channels - How was this edit made?

2012-11-13 Thread Diederik van Liere
Dario has been proposing RevTagging to exactly address this need, see:
http://www.mediawiki.org/wiki/Revtagging

I really think we should put this on the roadmap for 2013 for Mediawiki, we
definitely need this more granular level of instrumentation for determining
the source of an edit.

Best
Diederik


On Tue, Nov 13, 2012 at 6:19 AM, Amir E. Aharoni 
amir.ahar...@mail.huji.ac.il wrote:

 Hi,

 In the Bangalore DevCamp I spoke a bit with Brion about a way to
 measure various ways of editing MediaWiki pages. The original idea was
 to measure how much the mobile editing, when it becomes widely
 available, is actually used. A simplistic solution would be add a
 boolean rev_mobile field to the revision table, but this can apply
 to a lot of other things, for example:
 * Visual Editor vs. the current wiki-syntax editor
 * A usual browser vs. AutoWikiBrowser vs. direct API calls
 * bots vs. non-bots
 * for file uploads, Special:Upload vs. Special:UploadWizard

 Things get even more complicated, because several such flags may apply
 at once: for example, I can imagine a human editor using a mobile
 editing interface with a bot flag, because he makes a lot of tiny
 edits and the community doesn't want them to appear in RecentChanges.

 And of course, there may be privacy and performance implications, too.

 Nevertheless, some kind of metrics of the various contributions
 channels would be useful. Any more ideas?

 --
 Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
 http://aharoni.wordpress.com
 ‪“We're living in pieces,
 I want to live in peace.” – T. Moore‬

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] About RESOLVED LATER

2012-11-05 Thread Diederik van Liere
Hi,

I made the exact same argument a while back (Dropping the LATER resolution
in Bugzilla
http://wikimedia.7.n6.nabble.com/Dropping-the-LATER-resolution-in-Bugzilla-td743804.html
)
+1
D


On Mon, Nov 5, 2012 at 5:25 PM, Quim Gil quim...@gmail.com wrote:

 I was a bit of a lazy child, specially when it came to clean up my room or
 do my homework. I tried to convince my mom and teachers about the paradigm
 of RESOLVED LATER, but they never bought it. At the end I had to clean up
 my room and do my homework.

 Even as a child I suspected that they were actually right. If something
 has been postponed for later it can't be called resolved. Now it's me who
 hears from time to time excuses from my kids that sound more or less like
 RESOLVED LATER. Yeah, sure, I tell them, pointing to the source of
 pending work.  :)

 And now to the topic:

 What about removing the LATER resolution from our Bugzilla? It feels like
 sweeping reports under the carpet. If a team is convinced that something
 won't be addressed any time soon then they can WONTFIX. If anybody feels
 differently then they can take the report back and fix it.

 Before we could simply reopen the 311 reports filed under RESOLVED LATER:

 http://bit.ly/YxW60z

 Huggle  1
 MediaWiki   74
 MediaWiki extensions104
 Monuments database  1
 mwEmbed 3
 Parsoid 1
 Tools   2
 WikiLoves Monuments Mobile  4
 Wikimedia   114
 Wikimedia Labs  1
 Wikimedia Mobile3
 Wikipedia App   3
 Total   311


 Looking at the total amount of open bugs, the impact is not that big. The
 house will be as tidy/untidy as before, but at least things wll be clearer
 now.

 What do you think?

 --
 Quim

 __**_
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki community metrics

2012-10-15 Thread Diederik van Liere
 I'm not even sure where to find the code for http://gerrit-stats.wmflabs.*
 *org/ http://gerrit-stats.wmflabs.org/ . In gerrit I could only find
 the /analytics/scorecard project.


The repo is available at:
https://gerrit.wikimedia.org/r/gitweb?p=analytics%2Fgerrit-stats.git;a=shortlog;h=HEAD
As mentioned before, Limn is responsible for visualizing the data,
gerrit-stats only pulls data from Gerrit and construct measures. Happy to
discuss how to come up with developer centric measures.

Best,
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki community metrics

2012-10-02 Thread Diederik van Liere
 Question: what is the best approach to retrieve the number of existing
 Gerrit accounts?

This number is already stored within gerrit-stats, it is just not being
written to a dataset.
D
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] IPv6 usage on Wikimedia?

2012-09-17 Thread Diederik van Liere
On World IP6 day (June 6th 2012), we had about 5000 IP6 hits,
however, for the first 17 days of September we had a total
of 1,000,032,000 hits coming from IP6 addresses. This is based on the
sampled squid log data.

Best,
Diederik

On Mon, Sep 17, 2012 at 8:03 AM, David Gerard dger...@gmail.com wrote:

 On 17 September 2012 12:36, Thomas Dalton thomas.dal...@gmail.com wrote:
  On 17 September 2012 11:25, David Gerard dger...@gmail.com wrote:

  Do we have any stats on IPv6 accesses and edits on Wikimedia sites?
  I see this page on stats, which suggests it's literally so small we
  can't even count it:
  http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm
  Is that actually the case? 'Cos we do know IPv6 edits occur, therefore
  IPv6 page views occur.

  That's a split by country, why would it mention IPv6?
  Judging by the number of anonymous edits coming from IPv6 addresses,
  there might be fairly high usage.


 Indeed. So where are the actual stats?


 - d.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] IPv6 usage on Wikimedia?

2012-09-17 Thread Diederik van Liere
Here is the actual raw data since Jan. 1st 2012 (multiply each observation
with 1000 to get the estimated number of hits for that day). The assumption
is that each hit has the same probability of showing up in the squid log
file.

As you can see, after World IP6 day, we started supporting way more IP6
services and hence the increase in traffic.

/a/squid/archive/sampled/sampled-1000.log-20120101.gz,2
/a/squid/archive/sampled/sampled-1000.log-20120102.gz,2
/a/squid/archive/sampled/sampled-1000.log-20120103.gz,4
/a/squid/archive/sampled/sampled-1000.log-20120104.gz,3
/a/squid/archive/sampled/sampled-1000.log-20120105.gz,2
/a/squid/archive/sampled/sampled-1000.log-20120106.gz,7
/a/squid/archive/sampled/sampled-1000.log-20120107.gz,107
/a/squid/archive/sampled/sampled-1000.log-20120108.gz,139
/a/squid/archive/sampled/sampled-1000.log-20120109.gz,322
/a/squid/archive/sampled/sampled-1000.log-20120110.gz,367
/a/squid/archive/sampled/sampled-1000.log-20120111.gz,378
/a/squid/archive/sampled/sampled-1000.log-20120112.gz,341
/a/squid/archive/sampled/sampled-1000.log-20120113.gz,263
/a/squid/archive/sampled/sampled-1000.log-20120114.gz,187
/a/squid/archive/sampled/sampled-1000.log-20120115.gz,191
/a/squid/archive/sampled/sampled-1000.log-20120116.gz,360
/a/squid/archive/sampled/sampled-1000.log-20120117.gz,368
/a/squid/archive/sampled/sampled-1000.log-20120118.gz,510
/a/squid/archive/sampled/sampled-1000.log-20120119.gz,398
/a/squid/archive/sampled/sampled-1000.log-20120120.gz,274
/a/squid/archive/sampled/sampled-1000.log-20120121.gz,176
/a/squid/archive/sampled/sampled-1000.log-20120122.gz,177
/a/squid/archive/sampled/sampled-1000.log-20120123.gz,349
/a/squid/archive/sampled/sampled-1000.log-20120124.gz,339
/a/squid/archive/sampled/sampled-1000.log-20120125.gz,364
/a/squid/archive/sampled/sampled-1000.log-20120126.gz,366
/a/squid/archive/sampled/sampled-1000.log-20120127.gz,277
/a/squid/archive/sampled/sampled-1000.log-20120128.gz,175
/a/squid/archive/sampled/sampled-1000.log-20120129.gz,244
/a/squid/archive/sampled/sampled-1000.log-20120130.gz,370
/a/squid/archive/sampled/sampled-1000.log-20120131.gz,373
/a/squid/archive/sampled/sampled-1000.log-20120201.gz,366
/a/squid/archive/sampled/sampled-1000.log-20120202.gz,327
/a/squid/archive/sampled/sampled-1000.log-20120203.gz,259
/a/squid/archive/sampled/sampled-1000.log-20120204.gz,159
/a/squid/archive/sampled/sampled-1000.log-20120205.gz,192
/a/squid/archive/sampled/sampled-1000.log-20120206.gz,360
/a/squid/archive/sampled/sampled-1000.log-20120207.gz,351
/a/squid/archive/sampled/sampled-1000.log-20120208.gz,350
/a/squid/archive/sampled/sampled-1000.log-20120209.gz,306
/a/squid/archive/sampled/sampled-1000.log-20120210.gz,275
/a/squid/archive/sampled/sampled-1000.log-20120211.gz,176
/a/squid/archive/sampled/sampled-1000.log-20120212.gz,210
/a/squid/archive/sampled/sampled-1000.log-20120213.gz,336
/a/squid/archive/sampled/sampled-1000.log-20120214.gz,372
/a/squid/archive/sampled/sampled-1000.log-20120215.gz,339
/a/squid/archive/sampled/sampled-1000.log-20120216.gz,333
/a/squid/archive/sampled/sampled-1000.log-20120217.gz,272
/a/squid/archive/sampled/sampled-1000.log-20120218.gz,147
/a/squid/archive/sampled/sampled-1000.log-20120219.gz,202
/a/squid/archive/sampled/sampled-1000.log-20120220.gz,316
/a/squid/archive/sampled/sampled-1000.log-20120221.gz,321
/a/squid/archive/sampled/sampled-1000.log-20120222.gz,331
/a/squid/archive/sampled/sampled-1000.log-20120223.gz,334
/a/squid/archive/sampled/sampled-1000.log-20120224.gz,319
/a/squid/archive/sampled/sampled-1000.log-20120225.gz,178
/a/squid/archive/sampled/sampled-1000.log-20120226.gz,155
/a/squid/archive/sampled/sampled-1000.log-20120227.gz,229
/a/squid/archive/sampled/sampled-1000.log-20120228.gz,347
/a/squid/archive/sampled/sampled-1000.log-20120229.gz,344
/a/squid/archive/sampled/sampled-1000.log-20120301.gz,362
/a/squid/archive/sampled/sampled-1000.log-20120302.gz,339
/a/squid/archive/sampled/sampled-1000.log-20120303.gz,337
/a/squid/archive/sampled/sampled-1000.log-20120304.gz,201
/a/squid/archive/sampled/sampled-1000.log-20120305.gz,242
/a/squid/archive/sampled/sampled-1000.log-20120306.gz,421
/a/squid/archive/sampled/sampled-1000.log-20120307.gz,485
/a/squid/archive/sampled/sampled-1000.log-20120308.gz,460
/a/squid/archive/sampled/sampled-1000.log-20120309.gz,413
/a/squid/archive/sampled/sampled-1000.log-20120310.gz,322
/a/squid/archive/sampled/sampled-1000.log-20120311.gz,205
/a/squid/archive/sampled/sampled-1000.log-20120312.gz,202
/a/squid/archive/sampled/sampled-1000.log-20120313.gz,417
/a/squid/archive/sampled/sampled-1000.log-20120314.gz,478
/a/squid/archive/sampled/sampled-1000.log-20120315.gz,378
/a/squid/archive/sampled/sampled-1000.log-20120316.gz,426
/a/squid/archive/sampled/sampled-1000.log-20120317.gz,332
/a/squid/archive/sampled/sampled-1000.log-20120318.gz,231
/a/squid/archive/sampled/sampled-1000.log-20120319.gz,275
/a/squid/archive/sampled/sampled-1000.log-20120320.gz,440

Re: [Wikitech-l] Announcing initial version of gerrit-stats

2012-09-15 Thread Diederik van Liere


 Is the slowness issue known?
   -Niklas

Yes this is known and it is related to the fact that gerrit-stats is
currently hosted on a Labs instance. We are working on migrating it to
another server.
Best,
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Announcing initial version of gerrit-stats

2012-09-14 Thread Diederik van Liere
Hi everybody,

The Analytics Team is happy to announce the first version of gerrit-stats.
Gerrit-stats keeps track of the backlog of codereview for Git individual
repositories.

Gerrit-stats dashboard is available at http://gerrit-stats.wmflabs.org
Currently, it has a few example charts but we can add your repo to the
dashboard as well, just let us know!

To create a new chart yourself visit
http://gerrit-stats.wmflabs.org/graphs/new
This will launch the interface to create your own graph. Click on 'Data'
and  Click on 'Add Metric' and a pull down menu with all the repositories
will appear. Select the repository of your interest and select the metric
that you want to visualize. Once you have selected all the metrics of your
interest go back to 'Info' and enter a slug name. Then press 'Enter' and
then click the 'Save' button.

Currently, the following metrics are tracked (on a daily basis):
1) Number of new changesets
2) Number of changesets without any codereview per day (this excludes
automated review from lint and lint-like reviewers).
3) Number of changesets waiting for merge per day (only applies to
changesets that received only positive reviews)
4) Number of changesets self reviewed.

And for metrics 2 and 3, there is a version for volunteers and for WMF
staff.


Gerrit-stats is visualized using Limn, Limn is the data GUI developed by
the Analytics Team and lead by David Schoonover. Limn is available on
https://github.com/wikimedia/limn

This is the initial release and I am sure there will be bugs and issues. If
you have any questions, or problems using gerrit-stats then either:
1) Head over to #wikimedia-analytics on IRC and ask us
2) Send an email to the analytics mailinglist
3) Contact us directly.


Not Yet Frequently Asked Questions:

1) How do I create a visualization of the code review metrics for a repo?

Visit gerrit-stats.wmflabs.org/graphs/new

This will launch the interface to create your own graph. Click on 'Data'
and  Click on 'Add Metric'  and a pull down menu with all the repositories
will appear.  Select the repository of your interest and select the metric
that you want to visualize. Once you have selected all the metrics of your
interest press the 'Save' button.
Your are all set and you can use this permalink for future reference.

2) How do I edit an existing chart?
Simply append /edit to the URL of your chart and you can edit it.

3) My repository is not showing up in the pull down menu, what happened?
By default, all repositories are automagically kept track of as soon they
contain a single commit. There are two exceptions:
1) If your repository name contains the string 'test' or 'private', it will
be ignored.
2) The orgchart repository is not tracked by gerrit-stats, this is a known
issue but Chad and I haven't been able to figure out what causes this.

If your repository is missing then please contact me.

4) Will you add metrics for individual committers?
Right now, the unit of analysis is a repository but it is definitely
possible to keep track of codereview  metrics for individuals. However, I
would like to hear some use-cases first before embarking on this.

5) The chart looks to spikey, how can I have smoother lines?
1) Go to http://gerrit-stats.wmflabs.org/graphs/name_of_chart/edit
2) Click on 'Options'
3) Click on 'Advanced' (right side of screen)
4) Click on 'rollPeriod' (bottom of screen, yellow box)
5) This allows to create a moving average, so you can replace the 1 with 7
meaning that each datapoint is the average of the past 7 days. This option
applies to both metrics but it really smooths out the outliers.

6) I want a new metric. How do I go about it?
There are two options:
a) Clone the gerrit-stats repo and hack away,it's Python btw. We are happy
to help out!
b) Send us a suggestion for a new metric, the more precise the more useful!

On behalf of the Analytics Team,

Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Code review statistics and trends

2012-08-31 Thread Diederik van Liere

 There seems to be a 10-day lag (no data after August 21st). Is this a
 bug or a feature?


Data hasn't been pushed to gerrit for 10 days, something is wrong with the
script. We will fix it today.
D
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Code review statistics and trends

2012-08-25 Thread Diederik van Liere
Hi Harry
The change set numbers are accurate and the spikes are caused by translatewiki. 
See my response to siebrand on how to remove the outliers and create a smoother 
chart.

Best


Diederik

Sent from my iPhone

On 2012-08-25, at 17:01, Harry Burt jarry1...@gmail.com wrote:

 I realise that many contributors are WMF staff, and many WMF staff
 work a relatively predictable 5-day week, but the new changesets
 graph still seems a little spiky to my eyes.
 
 Given the +- 10 changesets range, how much confidence should I be
 placing in these numbers?
 
 Thanks,
 Harry
 
 --
 Harry Burt (User:Jarry1250)
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Code review statistics and trends

2012-08-23 Thread Diederik van Liere

On 2012-08-23, at 2:42 AM, Siebrand Mazeland (WMF) wrote:

 The graph for new changesets fluctuates a lot. I would guess this is
 due to change sets submitted by user l10n-bot. Maybe it's a good idea
 to filter those out, to get a  line that's a little easier to
 interpret.

Hey Siebrand,

I prefer to keep the data collection as simple as possible. One way of fixing 
this issue is as follows:

1) Go to http://gerrit-stats.wmflabs.org/graphs/mediawiki/edit
2) Click on 'Options'
3) Click on 'Advanced' (right side of screen)
4) Click on 'rollPeriod' (bottom of screen, yellow box)
5) This allows to create a moving average, so you can replace the 1 with 7 
meaning that each datapoint is the average of the past 7 days. This option 
applies to both metrics but it really smooths out the outliers. 

If you want to save this then please use another slug name (click on 'Info') 
and replace 'slug' and then click 'Save' else you will have changed Robla's 
original chart. 


Best,
D
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The Death of OAuth 2

2012-07-28 Thread Diederik van Liere
 
 Anyone want me to go back through the specs and make a list of some of the 
 things that are wrong with both 

Yes! I think that would be hugely helpful!
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] The Death of OAuth 2

2012-07-26 Thread Diederik van Liere
Hi all,

The lead author of Oauth 2.0, Eran Hammer, has withdrawn his name from the
OAuth 2 spec:

http://hueniverse.com/2012/07/oauth-2-0-and-the-road-to-hell/

That's a very sad news, IMHO, and it probably means we really should
reconsider what protocol we want to support Oauth 1.0 / Oauth 2.0 / SAML or
something else if we want to allow interoperability with our sites.

Best,
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Automatic mobile redirection enabled for *.wikimedia.org sites hosted on the cluster (except for commons)

2012-07-20 Thread Diederik van Liere
Hey arthur

It seems that the redirection to the mobile donation site 
(donate.m.wikimedia.org) does not work.
D

Sent from my iPhone

On 2012-07-19, at 19:35, Arthur Richards aricha...@wikimedia.org wrote:

 PS big thanks to Asher Feldman for getting the change compiled and deployed.
 
 On Thu, Jul 19, 2012 at 4:34 PM, Arthur Richards 
 aricha...@wikimedia.orgwrote:
 
 Around 21:30UTC automatic redirection to the mobile version of *.
 wikimedia.org sites hosted on the cluster (except for commons) was
 enabled with the deployment of https://gerrit.wikimedia.org/r/#/c/16000/.
 This is part of the ongoing effort by the mobile team to provide automatic
 redirection for mobile devices to the mobile version of all of our sites.
 For more information about the project and the timeline for enabling
 automatic redirection to the remaining projects, see
 http://www.mediawiki.org/wiki/Mobile_default_for_sibling_projects.
 
 Please let us know if you see any issues. As always, feel free to join us
 on IRC in #wikimedia-mobile.
 
 --
 Arthur Richards
 Software Engineer, Mobile
 [[User:Awjrichards]]
 IRC: awjr
 +1-415-839-6885 x6687
 
 
 
 
 -- 
 Arthur Richards
 Software Engineer, Mobile
 [[User:Awjrichards]]
 IRC: awjr
 +1-415-839-6885 x6687
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] getting Jenkins to run basic automated tests on commits to extensions

2012-07-06 Thread Diederik van Liere
 Yes, I don't disagree that jshint should be run by Jenkins. AIUI
 Timo's work to make jshint work on the command line is prep work for
 exactly that.
 
 
 Ah, I misunderstood you. Thought you meant so people can
 run it before uploading which no one will ever do ;-)
 

Maybe we should create a git pre-commit script that does the jslint / php -c 
check that people can install on their local dev computers. 
That way people will never forget it ;)

D


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Barkeep code review tool

2012-07-02 Thread Diederik van Liere

 Roan Kattouw wrote:
 Yes, ops essentially uses a post-commit workflow right now, and that
 makes sense for them.
 
 ops also uses pre-commit review for non-ops people :-]
 
 Yeah, that's right. What I meant to say (and thought I had said in
 some form later in that message) was that the puppet repo has
 post-commit review for most changes by ops staff, and pre-commit
 review for everything else (non-ops staff, volunteers, and certain
 changes by ops staff in some cases).

I became curious with these statements regarding self-review 
(committer==reviewer) and so I ran a couple
of queries against the gerrit database to see how often this occurs:

1) For the puppet repo, 84.1% of the commits is self-reviewed.
2) For the mediawiki core repo, 27.9%  of the commits is self-reviewed.
3) For the mediawiki extensions repos, 67.8%  of the commits is self-reviewed.


I think we need to take a step back from a tool-focused discussion and first 
hash out what our commit workflows are / should be. In particular: 

1) Should there be one commit workflow that applies to all teams? Looking at 
current practise, the answer seems to be no but I am curious to hear what other 
people think. If the answer is that it's okay for different teams to have 
different commit workflows, then we should also look for tools that support 
this. 

2) If self-review is so prevalent, does that mean that the pre-commit review 
workflow has failed? 

Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] HTTPS Wikipedia search for Firefox?

2012-06-19 Thread Diederik van Liere
Hey Chris 

Could you give us a ballpark estimate of how much search queries you expect per 
day?

Best
Diederik

Sent from my iPhone

On 2012-06-19, at 13:51, Chris Peterson cpeter...@mozilla.com wrote:

 Thanks, Ryan. When you guys would like Mozilla to make this switch to HTTPS, 
 you can just reopen Firefox bug 758857.
 
 chris
 
 
 On 6/19/12 10:35 AM, Ryan Lane wrote:
 On Tue, Jun 19, 2012 at 3:39 AM, Chris Peterson cpeter...@mozilla.com 
 wrote:
 hi, I'm a developer at Mozilla and I have a patch [1] that would switch
 Firefox's Wikipedia search box from HTTP to HTTPS.
 
 Who would be an appropriate technical contact at Wikimedia that I can
 coordinate with? Is this a change Wikimedia would welcome? Or would the
 increased SSL server load be an undue burden for Wikimedia? Just to be
 clear, this change would only affect Firefox users who search Wikipedia
 using Firefox's search box.
 
 A few months ago, Mozilla switched Firefox 14 (currently in Beta) to use
 Google's HTTPS search [2]. If I check in my Wikipedia patch soon, the change
 would ride Firefox's Nightly, Aurora, and Beta release channels [3] and be
 released to the general public in Firefox 16 (October 2012).
 
 Please don't do so. HTTPS is a new service, and we haven't properly
 load tested it yet. The first target for production load testing is
 for logged-in users.
 
 I'm not opposed to the change completely, but I'd prefer to let you
 guys know when we're ready.
 
 Thanks,
 
 - Ryan
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Giving people additional rights on gerrit

2012-06-18 Thread Diederik van Liere
Hi Antoine,

I really think we need to rethink how we are handing out non-admin Gerrit
rights to our engineers, both staff and volunteers. Create repo and create
branch rights should be handed out by default. There is absolute zero
reason for being stingy in handing out these rights. The loss of
productivity is really not acceptable and it would be a real shame if we
decided to drop Gerrit as our code review not because of Gerrit's
inadequacies but because the way we utilize Gerrit is broken.
Best,
Diederik

 Please bug Gerrit admins through the not so broken workflow :-)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Some old proposed changes in Gerrit waiting merge, after a code review.

2012-06-14 Thread Diederik van Liere
The analytics team has written A script to generate such reports and we will 
publish these results shortly once we have enough data points 
Best

Diederik

Sent from my iPhone

On 2012-06-14, at 4:31, Sébastien Santoro dereck...@espace-win.org wrote:

 Hi,
 
 I saw this morning those reviewed but not merged code changes in gerrit:
 
 Parser issue for HTML definition list
 Bug 11748: Handle optionally-closed HTML tags without tidy
2012-04-17
Owner: GWicke
Review: +1 by saper
https://bugzilla.wikimedia.org/11748
https://gerrit.wikimedia.org/r/#/c/5174/

 (bug 32381) Allow descending order for list=backlinks, list=embeddedin
 and list=imageusage
2012-04-30
Owner : Umherirrender
Review: +1 by Aaron Schulz
https://bugzilla.wikimedia.org/32381
https://gerrit.wikimedia.org/r/#/c/6108/

 Upgrade cortado-ovt to newer version (seems to work fine locally)
2012-05-05
Owner : Reedy
Review : +1 by awjrichards
https://gerrit.wikimedia.org/r/#/c/6640/
 
 Would it be interesting to generate an automated report detecting  45
 jours code submission having one at least +1 review but still not
 merged?
 
 -- 
 Best Regards,
 Sébastien Santoro aka Dereckson
 http://www.dereckson.be/
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers

2012-06-05 Thread Diederik van Liere
Hi Ori,

I absolutely 100% agree and we really need to sort this out this week. The
lost productivity is unacceptable.

So far I have heard different arguments why we cannot hand out 'create-repo
rights' to engineers:

The first reason was that only admin's could do it but that is not longer
true with the special create repo right group

The second reason was that Gerrit's permission system is either too complex
or engineers don't know how it works. I have full confidence in our
engineers that they can master Gerrit's permission system in less than a
day.

Now a new argument is unleashed and that is that we cannot delete
repos. The fact that we cannot delete repos is a non-argument. None of us
are going to create a bazillion repos.


The way we are using Git right now makes it a more centralized system than
Subversion ever was. This means that we are not using it right. So I really
hope that we can close this discussion by handing out the 'create-repo
right' to paid WMF engineers or any paid WMF engineer who requests this.


Diederik


On Tue, Jun 5, 2012 at 8:13 AM, Ori Livneh ori.liv...@gmail.com wrote:

 On Mon, Jun 4, 2012 at 11:00 PM, Jeremy Baron jer...@tuxmachine.com
 wrote:
 
  I mostly agree with what you've said.
 
  Just wanted to point out gerrit projects (aka repos) can never be
  destroyed. so if you e.g. typo or rename a project or kill it 5 days
  after you started it's still there forever. Only very recently have we
  even been able to hide projects from project listings in the UI.
 

 Isn't the same basically true of Wiki articles? I understand the desire to
 keep things tidy, okay. But what would be the big deal about having ten or
 even a hundred thousand abandoned repositories, so long as they are hidden,
 and do not clutter the UI? The repositories that would be candidates for
 deletion are the ones that got no further than an initial stab, and those
 measure in kilobytes.
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers

2012-06-05 Thread Diederik van Liere
On Tue, Jun 5, 2012 at 8:44 AM, Jeremy Baron jer...@tuxmachine.com wrote:

 On Tue, Jun 5, 2012 at 2:25 AM, Diederik van Liere dvanli...@gmail.com
 wrote:
  Now a new argument is unleashed and that is that we cannot delete
  repos. The fact that we cannot delete repos is a non-argument. None of us
  are going to create a bazillion repos.

 I was just pointing it out; I've no idea how gerrit behaves with lots
 of small+hidden repos. or with most of the repos in an instance
 hidden. Maybe it's not a problem.

I would suggest that we cross that bridge when we get there. AFAIK,Ori and
the E3 team would only need a handful of repos in the coming months and the
same applies to the Analytics team.


 It sounds like Ori (and I think this is true for other people too)
 would create lots of repos that don't live too long. Maybe that's a
 bazillion, maybe not.

 -Jeremy

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers

2012-06-05 Thread Diederik van Liere
So the estimated maximum number of projects is 10.000, while the default
maximum is 1.000.
For contributors, the default maximum is 1.000 and the estimated maximum
number is 50.000

Can we please tag this concern as addressed and start handing out the
rights?
Diederik

On Tue, Jun 5, 2012 at 11:32 AM, Ori Livneh ori.liv...@gmail.com wrote:

 On Mon, Jun 4, 2012 at 11:44 PM, Jeremy Baron jer...@tuxmachine.com
 wrote:
 
  I was just pointing it out; I've no idea how gerrit behaves with lots
  of small+hidden repos. or with most of the repos in an instance
  hidden. Maybe it's not a problem.
 

 Some numbers here:
 

 http://gerrit-documentation.googlecode.com/svn/Documentation/2.4/dev-design.html#_spam_and_abuse_considerations
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers

2012-06-05 Thread Diederik van Liere


 Did anyway say, Ask about it? I'm sure if you followed up with the one
 of the project creators (eg: chad) he would have been more than happy
 to push things along.


I am sorry but I disagree. The question is not whether Chad or one of the
Gerrit admin's will help us, because they are super responsive and are
always helping us out when there are issues. The question is: what do we
(WMF engineers) think is a sensible Git / Gerrit workflow. Creating repo's
is part of this workflow. I believe in decentralized teams and our software
should support this.

A workflow where engineers have to bug a Gerrit admin to do something is a
broken workflow:
* You will always bug an admin at the wrong time
* It always takes more time to bug somebody than DIY, we are really losing
productive hours on issues like this.
* We are professional engineers, and every engineer should know how to
create a repo in Gerrit.
* Bugging an engineer (in general) is not a scalable workflow and we should
really move away from these kind of of accepted practises.

We need to stop focusing on what Gerrit can / cannot do and we need to
start drafting out team-specific workflows on how we want to use Git /
Gerrit.

Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers

2012-06-05 Thread Diederik van Liere


 I've whipped up a quick tutorial for people who want to create new
 repositories[0]. If people can read and make sure they understand
 this page (with its various caveats), then yes, we can start handing
 this out.

 -Chad

 [0] https://www.mediawiki.org/wiki/Git/Creating_new_repositories



Dear Chad,
This is really helpful! Thanks so much for putting this together!
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Give create gerrit repo right to all WMF engineers

2012-06-01 Thread Diederik van Liere
Hi all,

Ryan Lane just showed me that in Gerrit there is a separate right for creating 
repositories. I suggest we give this right to all WMF engineers. A repo is free 
and fun and will prevent unnecessary delays. 

Best,
Diederik


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers

2012-06-01 Thread Diederik van Liere
Could you please add David Schoonover and Andrew Otto to the Project Creators 
group? 
Best,
Diederik

On 2012-06-01, at 5:41 PM, Chad wrote:

 I don't want to give this right to all engineers because setting up new
 repositories is more than just choosing the name. There's also the issue of
 understanding how Gerrit permissions work so you can set them up properly.
 I did make a new Project Creators group that I'm more than willing to add
 people to, once they've learned Gerrit permissions.
 
 In addition, unless you make a group you're in the owner of the repo (which
 can't be done via the GUI, only the CLI--this is a bug), you won't be able
 to set permissions at all (this is by design).
 
 So yeah, its not as easy as it sounds on the tin, so I don't want to hand
 this out en masse. In an ideal world, I want us to have a special page
 where people can request repos and we can automate the icky backend stuff.
 
 -Chad
 On Jun 1, 2012 10:33 AM, Diederik van Liere dvanli...@gmail.com wrote:
 
 Hi all,
 
 Ryan Lane just showed me that in Gerrit there is a separate right for
 creating repositories. I suggest we give this right to all WMF engineers. A
 repo is free and fun and will prevent unnecessary delays.
 
 Best,
 Diederik
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The bugtracker problem, again

2012-05-14 Thread Diederik van Liere
I don't think we should aim to cater to non-developers at all. The changes that 
a non-developer finds a real bug are very very small (in my previous life as an 
academic I have done a lot of research on Bugzilla and developer productivity 
and it's based on that experience that I am making this statement). I think 
that if a newbie / non-developer finds bugzilla then he /she should be 
redirected to either IRC / Teahouse / Talk pages / FAQ or any other support 
channel that we have. They can always be send back to file a bug report. 

If we are going to spend effort on improving bugzilla then it should be focused 
(IMHO) on matching a bug with the right developer (right meaning a person who 
can actually fix the problem). It is this area that Bugzilla (or any other bug 
tracker AFAIK)  provides very limited support. 

-- Diederik
On 2012-05-14, at 1:10 AM, Ryan Lane wrote:

 I don't think you'll ever find a finished bug-/issue-tracking solution that
 caters just as well for newbies and developers. The main reason is (of
 course?) that most issue tracking software is written for developers, by
 developers with little or no experience or thought as to what makes a good
 end-user experience. Also, most issue tracking tools are *made
 deliberately* to work best for developers - with human (end-user)
 interaction kept to a minimum. That's also why most issue tracking
 solutions end up looking like glorified (not the good kind) spreadsheets
 (Mantis, Flyspray, others?), something the IRS would want you to fill out
 (BZ, OTRS, RT, others?), or some kind of bastard child in-between (The Bug
 Genie, Redmine, Jira, Fogbugz, others?).
 
 
 I'd like to go one step further. There is not a single good bug/issue
 tracking system in existence. Yes, I'm completely serious too. I've
 come to believe that it's impossible to make one that anyone will be
 happy with. That includes most developers of tracking systems too
 (I've written one, and I hated it, though I liked it better than what
 I was using before).
 
 We can complain about this till the end of time. This discussion is
 even worse than bikeshedding discussions. At least with bikeshedding
 discussions you end up with a color for the bikeshed. When discussing
 bug/issue trackers you just end up with the same tracker, or another
 crappy tracker.
 
 - Ryan
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikisource link stats

2012-05-11 Thread Diederik van Liere
Hey Lars,

You might be interested in the WMF Analytics mailing list at 
https://lists.wikimedia.org/mailman/listinfo/analytics. There we discuss all 
our overall analytics projects, usually a little bit less focused on Mediawiki 
issues, but definitely focused on WMF data.

Hope to see you there!


Best,

Diederik
On 2012-05-03, at 4:46 PM, Lars Aronsson wrote:

 From [[Special:Linksearch]] I can find all the external links,
 based on the external links table in the database, which can
 be accessed by tools on the German toolserver.
 
 But is there any way to find similar information about links to
 Wikisource? I.e. what are the total number of links? Which pages
 link to a particular Wikisource page?
 
 
 -- 
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se
 
 
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Using tab as delimiter instead of space in the server log files

2012-05-09 Thread Diederik van Liere
Hi all,


In the last 24 hours I have found two new cases of spaces in log lines where 
the space is not used as a delimiter.

Case 1:
There are mobile page requests that contain a space in the URL, for example:

ssl1002 2198871 2012-04-06T23:50:24.566 0.002 0.0.0.0 FAKE_CACHE_STATUS/301 
1051 GET https://en.m.wikipedia.org/wiki/Extensor_carpi radialis longus 
NONE/mobilewikipedia - https://www.biodigitalhuman.com/ - 
Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64)%20AppleWebKit/535.19%20(KHTML,%20like%20Gecko)%20Chrome/18.0.1025.151%20Safari/535.19


Case 2:
The mimetype on varnish often contains additional charset=utf8 information, 
that results in a mimetype like application/json; charset=utf8 or text/xml; 
charset=utf8 

Instead of continuing patching our servers to fix these space issues I strongly 
suggest that we move away from the space as delimiter and start using the tab 
(\t) character. Spaces not being used as delimiters have been cropping up in 
our server logs for many years and it makes the analytics part that much more 
complex as we need to check more and more edge cases and/or create patches. I 
rather solve the problem at the root and that is by moving to a new delimiter. 

The delimiter is added by nginx/varnish/squid when writing the log file, so


Please let me know if this is a sane or insane idea.  Please let  me also know 
if you are a consumer of these server log files and you would need to make a 
change on your end to accommodate this change.

Andrew has been working hard on building a test environment in Labs where we 
have nginx / varnish / squid servers running with production configuration and 
where we can test these changes extensively. 

Best,


Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] OAuth

2012-04-27 Thread Diederik van Liere
The current version of  http://www.mediawiki.org/wiki/OAuth was written by
me and Dario.
It's definitely a starting point and not a finished proposal. I am not sure
to what extent the OAuth 2
protocol has evolved since this was written but that definitely needs to be
checked.

Diederik


On Fri, Apr 27, 2012 at 1:52 PM, Chris Steipp cste...@wikimedia.org wrote:

 Petr,

 OAuth is something we're committing to on the roadmap for Summer/Fall of
 this year. So baring anything crazy occurring, oauth should be happening
 over the next few months. I'm planning to help drive the process from WMF's
 side, but it's something I'm hoping some people in the community will also
 take on and help with.

 I've heard the mobile, api, and labs all want oauth to help with their
 projects. But can we start collecting specific user stories from anyone who
 wants to use oauth?

 It looks like most of the wikitech conversations have made it to
 http://www.mediawiki.org/wiki/OAuth, but would someone be willing to make
 sure it's up to date? I'll try to also get to over the next few days.

 Thanks!

 Chris


 On Fri, Apr 27, 2012 at 4:40 AM, Petr Bena benap...@gmail.com wrote:

  Some updates on this? Is WMF or someone going to work on this or it's
  waiting for someone to start?
 
  On Fri, Mar 16, 2012 at 3:19 PM, Petr Bena benap...@gmail.com wrote:
   Sorry, few typos:
  
   So, right now a question is if it's supposed to be implemented as
   extension or in core, or both (in case extension can't be created now,
   update core so that it's possible).
  
   ^ that's what I was about to say
  
   On Fri, Mar 16, 2012 at 3:17 PM, Petr Bena benap...@gmail.com wrote:
   So, right now a question is if it's supposed to be implemented as
   extension or in core, or both (in case extension can't be created now,
   updated core do that it's possible).
  
   I would rather make is as extension since there is a little benefit
   for most of mediawiki users in having this feature. I think it's
   better to keep only necessary stuff inside core and keep extra stuff
   as extensions.
  
   Is there any objection against implementing it as extension? Thanks
  
   On Wed, Mar 14, 2012 at 12:49 AM, John Erling Blad jeb...@gmail.com
  wrote:
   Just as an idea, would it be possible for Wikimedia Foundation to
   establish some kind of joint project with the SimpleSAMLphp-folks?
   Those are basically Uninett, which is FEIDE, which is those that
   handle identity federation for lots of the Norwegian schools,
 colleges
   and universities.. The SimpleSAML solution is in use in several other
   projects/countries, not sure whats the current status. The platform
   for FEIDE is also in use in several other countries so if the log on
   problems in Norway are solved other countries will be able to use the
   same solution.
  
   Note also that OAuth 2.0 seems to be supported.
  
 
 https://rnd.feide.no/2012/03/08/releasing-a-oauth-2-0-javascript-library/
  
   In april this year there is a conference GoOpen 2012
   (http://www.goopen.no/) in Oslo and some folks from Wikimedia
   Foundation is there, perhaps some folks from Uninett too? Could it be
   possible for interested people to sit down and discuss wetter a joint
   project is possible? Uninett is hiring for SimpleSAML development and
   that could be interesting too!
  
   John
  
   On Wed, Mar 14, 2012 at 12:13 AM, Thomas Gries m...@tgries.de
 wrote:
  
  
   There's really two separate things that these systems can do.
  
   The classic OAuth scenario is like this:
  
   site A: Wikipedia
user A
   site B: Huggle
  
   Site B initiates a special login on site A using a shared secret; on
   success, site A passes back authentication tokens to site B which
  verify
   that user A allowed site B access.
  
   Site B then uses those tokens when it accesses site A, in place of a
   username/password directly.
  
  
   OpenID, SAML, etc seem to be more appropriate for this scenario:
  
   site A: Wikipedia
   site B: University
user B
  
   These systems allow user B to verify their identity to site A; one
   possibility is to use this to associate a user A' with the remote
  user B,
   letting you use the remote ID verification in place of a local
  password
   authentication. (This is what our current OpenID extension does,
  basically.)
  
  
   These are, IMO, totally separate use cases and I'm not sure they
  should be
   treated the same.
  
  
   The Extension:OpenID can be used for both cases ( given, that you
 set
   $wgOpenIDClientOnly = false; )
   https://www.mediawiki.org/wiki/Extension:OpenID .
  
   The extension makes a MediaWiki installation OpenID 2.0-aware and
  lets
   users log in using their OpenID identity - a special URL - instead
 of
   (or as an alternative to) standard username/password log in. In that
   way, the MediaWiki acts as Relying part (RP) = OpenID consumer.[1]
  
   *As an option, it also allows the*_*MediaWiki to act as OpenID
   provider*, _so 

Re: [Wikitech-l] Page views

2012-04-11 Thread Diederik van Liere
 My suggestion for how to filter these bots efficiently in c program (no
 costly nuanced regexps) before sending data to webstatscollector:

 a) Find 14th field in space delimited log line = user agent (but beware of
 false delimiters in logs from varnish, if still applicable)
 b) Search this field case insensitive for bot/crawler/spider/http (by
 convention only bots have url in agent string)

 That will filter out most bot pollution. We still want those records in
 sampled log though.

 Any thoughts?
I did some research on fast string matching and it seems that the
recently developed algorithm by Leonid Volnitsky
is very fast (http://volnitsky.com/project/str_search/index.html). I
will do some benchmarks vs the ordinary C strstr function but
the author claims it's 20x faster.

So instead of hard coding where the bot information should be, just
search the entire logline for the bot information and if it is
present discard the logline and else process as-is.

Best,
Diederik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] 2nd Analytics Day videos are available

2012-04-09 Thread Diederik van Liere
Hi all,

March 2nd, we had our 2nd WMF Analytic Day. We taped all the sessions and they 
are now available on Commons:

http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Cassandra.ogv
http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_HBase.ogv
http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Hive.ogv
http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Peregrine.ogv
http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Storm.ogv
http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Hadoop.ogv



Big thanks to Chip for converting these gigantic files!

If you are curious to see what the Analytics Team is up to, then head over to 
our roadmap: http://www.mediawiki.org/wiki/Analytics/2012-2013_Roadmap

Best,

Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Page views

2012-04-09 Thread Diederik van Liere
Hi Srikanth,

Yes, we are looking into the growth percentages as they seem
unrealistically high.
Best,
Diederik


On Mon, Apr 9, 2012 at 3:30 AM, Srikanth Lakshmanan srik@gmail.com wrote:


 On Mon, Apr 9, 2012 at 00:46, Erik Zachte ezac...@wikimedia.org wrote:

 returns 20 lines from this 1:1000 sampled squid log file
 after removing javascript/json/robots.txt there are 13 left,
 which fits perfectly with 10,000 to 13,000 per day

 however 9 of these are bots!!


 Is this the same case for mobile stats as well? I don't think there could be
 sudden 100% growth for 2 months now across wikis[1] without some reason like
 this.

 [1]  http://stats.wikimedia.org/EN_India/TablesPageViewsMonthlyMobile.htm

 --
 Regards
 Srikanth.L

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Languages supported by Jenkins (was Changes status in Gerrit)

2012-04-06 Thread Diederik van Liere
Thanks, I meant to say what languages initially will be supported :)
D
On 2012-04-06, at 7:04 AM, Antoine Musso wrote:

 Le 05/04/12 20:20, Diederik van Liere a écrit :
 
 Which languages will Jenkins support?
 
 Jenkins is just a bot, we can make it do whatever we want. The plan is
 to have a universal linting job able to analyse any language or format
 in use, be it PHP, Python, JS, CSS ...
 
 https://www.mediawiki.org/wiki/Continuous_integration/Workflow_specification
 
 I am not sure when I am going to work on it, but for sure after we have
 bring back Testswarm alive.
 
 -- 
 Antoine hashar Musso
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Languages supported by Jenkins (was Changes status in Gerrit)

2012-04-05 Thread Diederik van Liere
Hi Chad,

On 2012-04-05, at 2:17 PM, Chad wrote:
 Once we've got jenkins working reliably, I plan to remove the
 verified permission so only the bots can set it.
 
 -Chad

Which languages will Jenkins support?

Best,
Diederik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] WURFL licensing concerns and Git migration

2012-03-21 Thread Diederik van Liere
I am in touch with the core developers of the Apache Devicemap project
and we are exploring the possibility of collaborating. If something
comes out of this exploration then I will announce it here.
Best,
Diederik


On Wed, Mar 21, 2012 at 2:33 AM, Patrick Reilly prei...@wikimedia.org wrote:
 I can remove it.

 — Patrick

 On Mar 20, 2012, at 10:53 PM, Erik Moeller e...@wikimedia.org wrote:

 On Tue, Mar 20, 2012 at 10:37 PM, Q overlo...@gmail.com wrote:
 ScientiaMobile basically took an open data repository and closed it, the
 complete opposite of what the WMF is trying to do. I'd strongly suggest
 looking for real Open solutions like OpenDDR/DeviceMap

 And apparently they've been trying to take down legitimate copies, too:
 http://openddr.org/takedown.html

 Yikes, that's evil. To the extent we're relying on it today, we should
 move off it ASAP.

 --
 Erik Möller
 VP of Engineering and Product Development, Wikimedia Foundation

 Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] WURFL licensing concerns and Git migration

2012-03-20 Thread Diederik van Liere
Yes, ScientiaMobile has made some very important changes to the
license and it does mean (AFAIK) that you cannot store the wurfl.xml
in a repository.

This paragraph is particularly important:

You are not authorized to create a derivative work of or otherwise modify
this WURFL file, and you are further not authorized to use, copy, display,
or distribute, in each case, any derivative work of this WURFL file,
whether created by you or someone else.

I think it's best with waiting to put the file in git and I'll forward
this question to the legal team.

Best,
Diederik


On Wed, Mar 21, 2012 at 12:03 AM, Kevin Israel pleasest...@live.com wrote:
 Our MobileFrontend extension, which is currently deployed on Wikimedia
 sites, uses WURFL to detect the mobile devices it targets. However, I
 recently became aware the version of the WURFL data files we use has a
 rather restrictive license.

 http://tech.groups.yahoo.com/group/wmlprogramming/message/34311

 The license seems to suggest we are not even supposed to redistribute
 verbatim copies or install the data files on multiple servers rather
 than only making [...] one copy [...], if not merely fail to grant
 such permission. Currently, the files are in our Subversion repository
 and are going to end up in Git soon.

 I am not a lawyer, and I realize this is probably a matter for the
 Wikimedia Foundation to handle, albeit one of urgent importance to us.
 If I am not mistaken, proper removal of infringing material from Git
 repositories is somewhat painful in that it causes all child SHA-1
 hashes to change, so I feel resolution of the above licensing concern
 blocks Git migration of at least the MobileFrontend extension.

 --
 Wikipedia user PleaseStand
 http://en.wikipedia.org/wiki/User:PleaseStand

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Git, Gerrit and the coming migration

2012-03-08 Thread Diederik van Liere

On 2012-03-07, at 6:01 AM, Chad wrote:
 
 My main worry is that we are not spending enough time on getting all
 engineers (both internal and in the community) up to speed with the
 coming migration to Git and Gerrit and that we are going to blame the
 tools (Gerrit and/or Git) instead of the complex interaction between
 three changes. We are making three fundamental changes in one-shot:
 1) Migrating from a centralized source control system to a
 decentralized system (SVN - Git)
 2) Introducing a new dedicated code-review tool (Gerrit)
 3) Introducing a gated-trunk model
 
 
 These are big changes. They're drastic changes. They require a
 rethinking of a great many things that we do from both technical
 and non-technical perspectives. Unfortunately, I don't see how
 we could've done #1 without #2. CodeReview is not designed (and
 was never designed) to work with a DVCS. The workflow's just not
 there, and it would've basically required rewriting huge parts of it.
 Rather than reinvent the wheel (again), we went with Gerrit.
 
 Arguably, we could've gone a straight push and skipped item #3. But
 given the continual code review backlog, and the desire to keep trunk
 stable (and hopefully deploy much more often), the decision to gate
 trunk was made pretty early on in the discussions.

I understand that we want to do all 3 of those changes, my point was merely to 
make it very in explicit what we are changing and that the biggest change, 
IMHO, is the introduction of 3). It seems that most of the discussion is 
focusing on the tools (that's also how this thread started) while I think the 
discussion should focus on mastering the new workflow and what we can do to 
make sure that we have the right tutorials  training available to make this 
migration as gentle as possible. I am confident that we will master the new 
tools, but a new workflow requires new habits and that might take more time to 
develop. 


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Git, Gerrit and the coming migration

2012-03-06 Thread Diederik van Liere
Hi all,

Some disclaimers before I start my thread:

1) I am a big believer in Git and dvcs and I think this is the right decision
2) I am a big believer in Gerrit and code-review and I think this is
the right decision
3) I might be wholly unaware / inaccurate of certain things, apologies
in advance.
4) A BIIGG thankyou to all the folks involved in preparing this
migration (evaluation, migration and training): in particular Chad,
Sumanah and Roan (but I am sure more people are involved and I am just
blissfully unaware).

My main worry is that we are not spending enough time on getting all
engineers (both internal and in the community) up to speed with the
coming migration to Git and Gerrit and that we are going to blame the
tools (Gerrit and/or Git) instead of the complex interaction between
three changes. We are making three fundamental changes in one-shot:
1) Migrating from a centralized source control system to a
decentralized system (SVN - Git)
2) Introducing a new dedicated code-review tool (Gerrit)
3) Introducing a gated-trunk model

My concern is not about the UI of Gerrit, I know it's popular within
WMF to say that it's UI sucks but I don't think that's the case and
even if it was an issue it's only minor. People have already suggested
that we might consider other code-review systems, I did a quick Google
search and we are the only community considering migrating from Gerrit
to Phabricator. I think this is besides the point:  the real challenge
is moving to a gated-trunk model, regardless of the chosen code-review
tool. I cannot imagine other code-review tools that are also based on
a gated-trunk model and work with Git are much easier than Gerrit. The
complexity comes from the gated-trunk model, not from the tool.

The gated-trunk model means that, when you clone or pull from master,
it might be the case that files relevant to you have been changed but
that those new changes are waiting to be merged (the pull request
backlog, AKA the code-review backlog). In the always-commit world with
no gatekeeping between developers and master, this never happens; your
local copy can always be fully synchronized with trunk (master).
Even if a commit is reverted, then your local working copy will still
have it, and any changes that you might have based on this reverted
commit, you can still commit. Obviously people get annoyed when you
keep checking in reverted code, but it won't break anything.

In an ideal world, our code-review backlog would be zero commits at
any time of the day, if that's the case then 'master' is always
up-to-date and you have the same situation as with the 'always-commit'
model. However, we know that the code-review backlog is a fact and
it's the intersection of Git, Gerrit and the backlog that is going to
be painful.

Suppose I clone master, but there are 10 commits waiting to be
reviewed with files that are relevant to me. I am happily coding in my
own local branch and after a while ready to commit. Meanwhile, those
10 commits have been reviewed and merged and now when I want to merge
my branch back to master I get merge conflicts. Either I discover
these merge conflicts when my branch is merged back to master or if I
pull mid-way to update my local branch.

To be a productive engineer after the migration it will *not* be
sufficient if you have only mastered git clone, git pull, git push,
git add and git commit commands. These are the basic git commands.

Two overall recommendations:

1) The Git / Gerrit combination means that you will have to understand
git rebase, git commit --amend, git bisect and git cherry-pick. This
is advanced Git usage and that will make the learning curve steeper. I
think we need to spend more time on training, I have been looking for
good tutorials about GitGerrit in practise and I haven't been able to
find it but maybe other people have better Google Fu skills (I think
we are looking for advanced tutorials, not just cloning and pulling,
but also merging, bisect and cherrypick).

2) We need to come up with a smarter way determining how to approach
the code-review backlog. Three overall strategies come to mind:
a) random, just pick a commit
b) time-based picking (either the oldest or the youngest commit)
c) 'impact' of commit

a) and b) do not require anything but are less suited for a
gated-trunk model. Option c) could be something where we construct a
graph of the codebase and determine the most central files (hubs) and
that commits are sorted by centrality in this graph. The graph only
needs to be reconstructed after major refactoring or every month or
so. Obviously, this requires a bit of coding and I don't have formal
proof that this actually will reduce the pain but I am hopeful. If
constructing a graph is too cumbersome then we can sort by the number
of affected files in a commit as a proxy.  If we cannot come up with a
c) strategy then the only real option is to make sure that the queue
is as Wikimedia short as possible.


Best,
Diederik


Re: [Wikitech-l] Proposed removal of some API output formats

2012-02-09 Thread Diederik van Liere
Hi,

Andre Engels  did some analysis of the type of API formats used. The
data is from  a single random Sunday in late 2011:

1997267 application/json
314285 text/xml
171259 -
68358 application/vnd.php.serialized
55549 text/html
34680 text/javascript
 8907 application/x-www-form-urlencoded
 8882 application/xml
  807 application/rsd+xml
  467 text/text
  105 application/x-www-form-urlencoded;
   18 application/yaml
1 multipart/form-data;



yaml is used for the query and parse API actions. On this particular
day, the following services used yaml:

http://www.huddba.cz
corporama.com
reftag.appspot.com


Thank you Andre!


Best,
Diederik



On Wed, Feb 8, 2012 at 7:45 PM, Roan Kattouw roan.katt...@gmail.com wrote:

 On Wed, Feb 8, 2012 at 11:42 PM, Tim Starling tstarl...@wikimedia.org wrote:
  What are the other problems?
 
 I'm not sure what Max is referring to, other than the fact that I hate
 XML (or at least using XML for this API) and generally don't like the
 fact that we have to support so many formats. As I said on Bugzilla
 earlier today, if I ever were to rewrite the API from scratch it'd be
 JSON-only. However, we can't actually get rid of XML realistically.

  * YAML - we don't serve real YAML anyway, currently it's just a subset
    of JSON.
 
  YAML is just a few harmless lines of code, why would you want to
  remove it?
 
 Yeah that can probably stay, it's not worth breaking anything over.

 Roan

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Welcome, Andrew Otto - Software Developer for Analytics

2012-01-06 Thread Diederik van Liere
Welcome andrew!
Super excited to have you joining us!

Diederik

Sent from my iPhone

On 2012-01-06, at 13:13, Sumana Harihareswara suma...@wikimedia.org wrote:

 On 01/06/2012 01:08 PM, Rob Lanphier wrote:
 We're really excited to have Andrew on board to help bring some
 systems rigor to our data gathering process.  Our current data mining
 regime involves a few pieces of lightweight data gathering
 infrastructure (e.g. udp2log), a combination of one-off special
 purpose log crunching scripts, along with other scripts that started
 their lives as one-off special purpose scripts, but have gradually
 become core infrastructure. Most of these scripts have single
 maintainers, and there is a lot of duplication of effort.  In
 addition, the systems have a nasty tendency to break at the least
 opportune times.  Andrew's background bringing sanity to insane
 environments will be enormously helpful here.
 
 (See episode S10E07, The Shadow Scripts,
 https://blog.wikimedia.org/2011/10/31/data-analytics-at-wikimedia-foundation/
 )*
 
 Andrew is based out of Virginia, but is still traveling the world.
 Right now, you'll find him in New York City.  Please join me in
 welcoming Andrew to the team!
 
 I congratulated him IN PERSON five minutes ago, because we're coworking
 today.  There's another New Yorker now, yay!
 
 -- 
 Sumana Harihareswara
 Volunteer Development Coordinator
 Wikimedia Foundation
 
 * I am being silly and acting as though this blog entry were an episode
 of a science fiction TV show.
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Git migration progress for MW core

2011-12-13 Thread Diederik van Liere
Hi Chad,

Reposurgeon (http://catb.org/~esr/reposurgeon/ ) might be a useful tool to
help fix the svn history.

Best,

Diederik

On Tue, Dec 13, 2011 at 11:47 AM, Chad innocentkil...@gmail.com wrote:

 On Tue, Dec 13, 2011 at 11:44 AM, Chad innocentkil...@gmail.com wrote:
  Couple of caveats (things I'm gonna try and fix):
  * Permissions aren't sorted yet, so it's only supporting anonymous
 clones,
  no pushing yet.
  * The revision graph is crazy. svn:mergeinfo is unreliable and we're
 pretty
  much unable to build a cohesive history without a *lot* of manual labor.
 Right
  now I'm thinking of just dropping the mergeinfo so the branches look
 like linear
  graphs cherry picking from master. Not perfect, but less annoying than
 now.
 

 Also there's two stupid commits at the head of master due to my
 mistake when initially pushing the repo. That won't happen again
 on subsequent tests or the real conversion.

 -Chad

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Mediawiki 2.0

2011-12-08 Thread Diederik van Liere
I think that the current version numbering system is confusing, incremental
version increases from 1.15 to 1.16 to 1.17 to 1.18, etc suggest to most
people minor changes with no compatibility implications. This is not the
case with MW. The Chrome version numbering is the other extreme, releasing
every 6 weeks a major version increment. In the end I think that a version
system should give an idea how much has changed under the hood. just my 2
cents.
Diederik


On Thu, Dec 8, 2011 at 4:19 AM, Tim Starling tstarl...@wikimedia.orgwrote:

 On 08/12/11 05:45, Dan Nessett wrote:
  On Wed, 07 Dec 2011 12:54:22 +1100, Tim Starling wrote:
 
  On 07/12/11 12:34, Dan Nessett wrote:
  On Wed, 07 Dec 2011 12:15:41 +1100, Tim Starling wrote:
  How many servers do you have?
 
  3. It would help to get it down to 2.
 
  I assume my comments apply to many other small wikis that use MW as
  well. Most operate on a shoe string budget.
 
  You should try running MediaWiki on HipHop. See
 
  http://www.mediawiki.org/wiki/HipHop
 
  It's not possible to pay developers to rewrite MediaWiki for less than
  what it would cost to buy a server. But maybe getting a particular MW
  installation to run on HipHop with a reduced feature set would be in the
  same order of magnitude of cost.
 
  -- Tim Starling
 
  Are there any production wikis running MW over HipHop?

 No. There are very few test installations, let alone production
 installations. But isn't it exciting to break new ground?

 -- Tim Starling


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Call Graphs in MediaWiki Documentation

2011-12-08 Thread Diederik van Liere
-1,
Personally, I like them because they give me a quick overview of the
inter-dependencies and how methods related to each other and so I guess
that for other 'newbies' this helps in getting through the learning curve
faster.
Diederik



On Thu, Dec 8, 2011 at 12:51 PM, Yuvi Panda yuvipa...@gmail.com wrote:

 Why do we have callgraph images in the documentation? I can't
 understand how they are useful, and they eat bandwidth (+ screenspace)
 unnecessarily. Is there a reason for their existence? Can we get rid
 of them?

 Check this for an example: http://svn.wikimedia.org/doc/classLinker.html

 --
 Yuvi Panda T
 http://yuvi.in/blog

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Dropping the 'LATER' resolution in Bugzilla

2011-11-29 Thread Diederik van Liere
But then the bug should be NEW, nobody is checking for a bug that is marked
LATER. I mentioned WORKSFORME because i suspect that some of the LATER bugs
have been resolved by now.
Diederik


On Tue, Nov 29, 2011 at 1:59 PM, Chad innocentkil...@gmail.com wrote:

 On Tue, Nov 29, 2011 at 1:45 PM, Diederik van Liere dvanli...@gmail.com
 wrote:
  Hi folks,
 
  Currently, we have a 'LATER' resolution in Bugzilla, it contains 339 bug
  reports over all the products, see:
 
 
 https://bugzilla.wikimedia.org/buglist.cgi?query_format=advancedlist_id=57731resolution=LATERproduct=CiviCRMproduct=Cortadoproduct=dbzip2product=Kate%27s%20Toolsproduct=Logwoodproduct=MediaWikiproduct=MediaWiki%20extensionsproduct=mwdumperproduct=mwEmbedproduct=Wikimediaproduct=Wikimedia%20Mobileproduct=Wikimedia%20Toolsproduct=Wiktionary%20toolsproduct=XML%20Snapshots
 
  The question is, when is LATER? Technically, these bugs are not open and
 so
  nobody will ever see them again and that's how they will be forgotten.
 
  To me, it seems that bugs that are labeled LATER should either be
 labeled:
  1) WONTFIX, which I guess is the majority of these bugs
  2) WORKSFORME, I am sure some things have been fixed
  3) NEW, it is a real bug / feature request.
 

 LATER means we can't or won't do it (right now) but that is
 likely to change in the future. WONTFIX implies no and this
 is not likely to change

 WORKSFORME is unrelated.

 -Chad

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Dropping the 'LATER' resolution in Bugzilla

2011-11-29 Thread Diederik van Liere
I agree, currently the LATER acts as a blackhole and there is no structured
process to re-evaluate these kind of bugs.

I have done a lot of reading of these bugs and many were filed 3 to 5 years
ago, I think it's better to say WONTFIX then to suggest that this is
something that is going to be fixed.
It is about expectation management :)

On Tue, Nov 29, 2011 at 2:53 PM, Merlijn van Deen valhall...@arctus.nlwrote:

 On 29 November 2011 19:45, Diederik van Liere dvanli...@gmail.com wrote:

  The question is, when is LATER? Technically, these bugs are not open and
 so
  nobody will ever see them again and that's how they will be forgotten.
 

 I would interpret 'LATER' as 'this bug should be re-evaluated after a
 certain period of time'.

 Following this train of thought, a LATER bug should have a re-evaluation
 date planned, after which it is changed back to NEW. This probably is not
 possible, but I think it makes sense to change LATER bugs to NEW after,
 say, a year or so.

 Merlijn
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Dropping the 'LATER' resolution in Bugzilla

2011-11-29 Thread Diederik van Liere
So today I have read about a 100 LATER marked bug reports and I do think we 
need the LATER resolution, but I would suggest to limit it's use case to only 
those bugs were an external constituent, either the Wikipedia community or a 
third-party software developer, needs to take an action and *then* we need to 
actually follow up on that.  So this would, IMHO, exclude the following type of 
bug reports:

1) We do not currently have enough resources (is not a good reason to label it 
LATER)
2) A bug that is dependent on another bug (is not a good reason to label it 
LATER)
3) Bug reports that only dependent on upstream but do not require any action 
after it has been fixed should not be labeled LATER


I am not sure how to handle bug reports that require a major architectural 
overhaul, not a big fan of LATER but not quite sure if there is a better 
alternative. 


Best,


Diederik


On 2011-11-29, at 8:35 PM, Jay Ashworth wrote:

 - Original Message -
 From: Mark A. Hershberger mhershber...@wikimedia.org
 
 Jay Ashworth j...@baylink.com writes:
 Do we have a Target release in our BZ?
 
 We've begun using Milestones in Bugzilla for this. One of the
 milestones is Mysterious Future. I think you should feel free to use
 that instead of LATER.
 
 I love this, and am promptly stealing it for my own.
 -- j
 -- 
 Jay R. Ashworth  Baylink   
 j...@baylink.com
 Designer The Things I Think   RFC 2100
 Ashworth  Associates http://baylink.pitas.com 2000 Land Rover DII
 St Petersburg FL USA  http://photo.imageinc.us +1 727 647 1274
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Northern Soto Wikipedia

2011-11-05 Thread Diederik van Liere
It works on safari but it definitely gives a backtrace error on firefox 7
Diederik

Sent from my iPhone

On 2011-11-05, at 9:56, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote:

 2011/11/5 Andre Engels andreeng...@gmail.com:
 There seems to be a Northern Soto Wikipedia at http://nso.wikipedia.org, at
 least that's what http://incubator.wikimedia.org/wiki/Wp/nso claims.
 However, when I go to that site I see the following text:
 
 Unstub loop detected on call of $wgLang-getCode from MessageCache::get
 
 Backtrace:
 ...
 
 It works for me. Can you try again?
 
 --
 Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
 http://aharoni.wordpress.com
 “We're living in pieces,
 I want to live in peace.” – T. Moore‬
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Northern Soto Wikipedia

2011-11-05 Thread Diederik van Liere
I am running Firefox 7.01 on OSX Leopard 10.6.8 and it gives a backtrace
error.
Diederik

On Sat, Nov 5, 2011 at 10:15 AM, Ole Palnatoke Andersen palnat...@gmail.com
 wrote:

 On Sat, Nov 5, 2011 at 2:56 PM, Amir E. Aharoni
 amir.ahar...@mail.huji.ac.il wrote:
  2011/11/5 Andre Engels andreeng...@gmail.com:
  There seems to be a Northern Soto Wikipedia at http://nso.wikipedia.org,
 at
  least that's what http://incubator.wikimedia.org/wiki/Wp/nso claims.
  However, when I go to that site I see the following text:
 
  Unstub loop detected on call of $wgLang-getCode from MessageCache::get
 
  Backtrace:
  ...
 
  It works for me. Can you try again?
 

 Windows Vista:

 Chrome 15.0.874.106: Same experience as Andre.
 Firefox 6.0.1, Opera 11.50, Safari 5.0.5, IE8: Same as Amir.

 - Ole


 --
 http://palnatoke.org * @palnatoke * +4522934588

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] lost in sites for coding challenge mobile

2011-10-24 Thread Diederik van Liere
I think that is true unless you want to upload a fair use image, that is not 
allowed on commons but is on some wikipedia sites like the English.
Diederik
Sent from my iPhone

On 2011-10-24, at 8:23, Greg DeKoenigsberg greg.dekoenigsb...@gmail.com wrote:

 This is a good question.  Simone sent it to me privately, and it
 occurred to me that the answer was sufficiently non-obvious that I
 asked him to report to wikitech-l.
 
 In looking at this page:
 
 http://en.wikipedia.org/wiki/Wikipedia:Files_for_upload
 
 ...it seems as though uploading to Commons is the preferred option.
 Is that right?
 
 --g
 
 On Mon, Oct 24, 2011 at 7:14 AM, Simone simonelocc...@gmail.com wrote:
 i am lost in sites, when i upload a picture where it must go?
 in *.wikipedia.org or in commons.wikimedia.org?
 i didn't understand...
 another thing is that to take the token the user must be logged,
 and i have seen that every country has got different logins,
 for examples:
 my login of it.wikipedia.org is different than en.wikipedia.org...
 so the user must choose first the domain where want to log
 and after take the token to upload the contents...
 
 Thank's for attenction,
 Simo
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] page view stats redux

2011-09-16 Thread Diederik van Liere
This is really cool! Thanks Ariel and team for making this available.
best,
Diederik

On Thu, Sep 15, 2011 at 5:16 PM, MZMcBride z...@mzmcbride.com wrote:
 Ariel T. Glenn wrote:
 I think we finally have a complete copy from December 2007 through
 August 2011 of the pageview stats scrounged from various sources, now
 available on our dumps server.

 See http://dumps.wikimedia.org/other/pagecounts-raw/

 This is a great step in the right direction! Thanks!

 MZMcBride



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-04 Thread Diederik van Liere
Thanks for moving the page.
Diederik

On 2011-09-04, at 3:29 PM, Krinkle wrote:

 2011/9/4 MZMcBride z...@mzmcbride.com
 
 Diederik van Liere wrote:
 I've suggested to generate bulk checksums as well but both Brion and
 Ariel see
 the primary purpose of this field to check the validity of the dump
 generating
 process and so they want to generate the checksums straight from the
 external
 storage.
 
 [...]
 
 PS: not sure if this proposal should be on strategy or mediawiki...
 
 I think standard practice nowadays is a subpage of
 http://www.mediawiki.org/wiki/Requests_for_comment.
 
 MZMcBride
 
 
 Indeed. Moved:
 http://mediawiki.org/wiki/Requests_for_comment/Database_field_for_checksum_of_page_text
 
 
 – Krinkle
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-03 Thread Diederik van Liere
Hi,

I've suggested to generate bulk checksums as well but both Brion and Ariel see 
the primary purpose of this field to check the validity of the dump generating 
process and so they want to generate the checksums straight from the external 
storage. 

In a general sense, there are two use cases for this new field:
1) Checking the validity of the XML dump files
2) Identifying reverts

I have started to work on a proposal for deployment (and while being 
incomplete) it might be a good start to further plan the deployment. I have 
been trying to come up with some back-of-the-envelope calculations about how 
much time and space it would take but I don't have all the required information 
yet to come up with some reasonable estimates. 

You can find the proposal here: 
http://strategy.wikimedia.org/wiki/Proposal:Implement_and_deploy_checksum_revision_table

I want to thank Brion and Asher for giving feedback on prior drafts. Please 
feel free to improve this proposal.

Best,
Diederik

PS: not sure if this proposal should be on strategy or mediawiki...


On 2011-09-03, at 7:16 AM, Daniel Friesen wrote:

 On 11-09-02 09:33 PM, Rob Lanphier wrote:
 On Fri, Sep 2, 2011 at 5:47 PM, Daniel Friesen
 li...@nadir-seen-fire.com wrote:
 On 11-09-02 05:20 PM, Asher Feldman wrote:
 When using for analysis, will we wish the new columns had partial indexes
 (first 6 characters?)
 Bug 2939 is one relevant bug to this, it could probably use an index.
 [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=2939
 My understanding is that having a normal index on a table the size of
 our revision table will be far too expensive for db writes.
 ...
 Rob
 We've got 5 normal indexes on revision:
 - A unique int+int
 - A binary(14)
 - An int+binary(14)
 - Another int+binary(14)
 - And a varchar(255)+binary(14)
 
 That bug wise a (rev_page,rev_sha1) or (rev_page,rev_timestamp,rev_sha1)
 may do.
 
 -- 
 ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-08-18 Thread Diederik van Liere
Hi!
I am starting this thread because Brion's revision r94289 reverted
r94289 [0] stating core schema change with no discussion [1].
Bugs 21860 [2] and 25312 [3] advocate for the inclusion of a hash
column (either md5 or sha1) in the revision table. The primary use
case of this column will be to assist detecting reverts. I don't think
that data integrity is the primary reason for adding this column. The
huge advantage of having such a column is that it will not be longer
necessary to analyze full dumps to detect reverts, instead you can
look for reverts in the stub dump file by looking for the same hash
within a single page. The fact that there is a theoretical chance of a
collision is not very important IMHO, it would just mean that in very
rare cases in our research we would flag an edit being reverted  while
it's not. The two bug reports contain quite long discussions and this
feature has also been discussed internally quite extensively but oddly
enough it hasn't happened yet on the mailinglist.

So let's have a discussion!

[0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289
[1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94541
[2] https://bugzilla.wikimedia.org/show_bug.cgi?id=21860
[3] https://bugzilla.wikimedia.org/show_bug.cgi?id=25312

Best,

Diederik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Changing XML Wikipedia Schema to Enable Smaller Incremental Dumps that are Hadoop ready

2011-08-18 Thread Diederik van Liere
Hi!

Over the last year, I have been using the Wikipedia XML dumps
extensively. I used it to conduct the Editor Trends Study [0] and me
and the Summer Research Fellows [1] have used it in the last three
months during the Summer of Research. I am proposing some changes to
the current XML schema based on those experiences.

The current XML schema presents a number of challenges for both the
people who are creating dump files as the people who are consuming the
dump files. Challenges include:

1) The embedded structure of the schema, a single page tag with
multiple revision tags makes it very hard to develop an incremental
dump utility
2) A lot of post processing is required.
3) By storing the entire text for each revision, the dump files are
getting so large that they become unmanageable for most people.


1. Denormalization of the schema
Instead of having a page tag with multiple revision tags, I
propose to just have revision tags. Each revision tag would
include a page_id, page_title, page_namespace and
page_redirect tag. This denormalization would make it much easier to
build an incremental dump utility. You only need to keep track of the
final revision of each article at the moment of dump creation and then
you can create a new incremental dump continueing from the last dump.
It would also easier to restore a dump process that crashed.  Finally,
tools like Hadoop would have a way easier time handling this XML
schema than the current one.


2. Post-processing of data
Currently, a significant amount of time is required for
post-processing the data. Some examples include:
* The title includes the namespace and so to exclude pages from a
particular namespace requires generating a separate namespace
variable. Particularly, focusing on the main namespace is tricky
because that can only be done by checking whether a page does not
belong to any other namespace (see bug
https://bugzilla.wikimedia.org/show_bug.cgi?id=27775).
* The redirect tag currently is either True or False, more useful
would be the article_id of the page to which a page is redirected.
* Revisions within a page are sorted by revision_id, but they should
be sorted by timestamp. The current ordering makes it even harder to
generate diffs between two revisions (see bug
https://bugzilla.wikimedia.org/show_bug.cgi?id=27112)
* Some useful variables in the MySQL database are not yet exposed in
the XML files. Examples include:
- Length of revision (part of Mediawiki 1.17)
- Namespace of article


3. Smaller dump sizes
The dump files continue to grow as the text of each revision is stored
in the XML file. Currently, the uncompressed XML dump files of the
English Wikipedia are about 5.5Tb in size and this will only continue
to grow. An alternative would be to replace the text tag with a
text_added and text_removed tags. A page can still be
reconstructed by patching multiple text_added and text_removed
tags. We can provide a simple script / tool that would reconstruct the
full text of an article up to a particular date / revision id. This
has two advantages:
1) The dump files will be significantly smaller
2) It will be easier and faster to analyze the types of edits. Who is
adding a template, who is wikifying an edit, who is fixing spelling
and grammar mistakes.


4. Downsides
This suggestion is obviously not backwards compatible and it might
break some tools out there. I think that the upsides (incremental
backups, Hadoop-ready and smaller sizes) outweigh the downside of
being backwards incompatible. The current way of dump generation
cannot continue forever.

[0] http://strategy.wikimedia.org/wiki/Editor_Trends_Study,
http://strategy.wikimedia.org/wiki/March_2011_Update
[1] http://blog.wikimedia.org/2011/06/01/summerofresearchannouncement/

I would love to hear your thoughts and comments!

Best,
Diederik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?

2011-06-16 Thread Diederik van Liere
Dear Alec,

Maybe the Community Department can help you out with your question. We
are doing a number of research sprints this summer to map out
different aspects of the Wikipedia communities and this sounds like a
great question and we have some researchers available to help write
the queries.
So please contact me and I'll hook you up with the right people.
Best,
Diederik


On Thu, Jun 16, 2011 at 4:40 AM, Platonides platoni...@gmail.com wrote:
 Alec Conroy wrote:
 I think I can build you something if you give me appropiate values for
 the above definition.

 Cheers

 Excellent-- so striking while the iron is hot-- I see that
 [[Special:Statistics]] defines active as edited within the last 30
 days.    I'm open to whoever many users we can realistically get info
 on-- the more the merrier, at least until I run out of ram. :)

 My initial query my go something like
 Select users where lasttouched was within the last month and total
 edit counts are greater than 500.

 And then, adding in the requirement of second project will narrow that pool.
 And then adding the constraint of a second project with a second
 language will narrow the pool even more.

 We're looking for the orphan community who have a lot of editors but
 little connection to English and Meta.

 I have added a small script at
 http://www.toolserver.org/~platonides/activeusers/activeusers.php to
 show active users per project and language.
 Requisites for appearing there are more than 500 edits (total) and at
 least one action (usually an edit) in the last month (since May 16, data
 is cached).
 Bots appear in the list.
 I'm still populating the data, but it should be completed by the time
 you read this.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Open call for Parser bugs

2011-04-06 Thread Diederik van Liere
I love this idea!
Diederik

On Wed, Apr 6, 2011 at 5:21 PM, Mark A. Hershberger
mhershber...@wikimedia.org wrote:

 Starting with this coming Monday's bug triage, I want to try and make
 sure the community's voice is heard.  In order to do that, I've created
 the “triage” keyword in Bugzilla.  Every week, I'll announce a theme and
 use this keyword to keep track of the bugs that will be handled in the
 meeting.

 As we discuss the bug, it will be modified, probably assigned to a
 developer, and the “triage” keyword removed.  Some people may see this
 as bug-spam, but I'd like to keep the email notifications on so that
 people who have expressed an interest will know that we're giving the
 bugs some love.

 This week, I'm going to focus on Parser-related bugs.  There are
 currently 10 bugs on the with the “triage” keyword applied.  A bug
 triage meeting needs about 30 bugs, so I have room for about 20 more
 right now.  I'll be adding to the list before Monday, but this is your
 chance to get WMF's developers talking about YOUR favorite parser bug by
 adding the “triage” keyword.

 I will reserve the right to remove the “triage” keyword — especially if
 the list becomes unwieldy, or if the bug has nothing to do with parsing
 — but I wanted to start to open up the triage process a bit more and
 begin to provide a way for the community to participate in these
 meetings.

 Mark.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
a href=http://about.me/diederik;Check out my about.me profile!/a

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Converting to Git?

2011-03-23 Thread Diederik van Liere
The Python Community recently switched to a DVCS and they have
documented their choice.
It compares Git, Mercurial and Bzr and shows the pluses and minuses of
each. In the end, they went for Mercurial.

Choosing a distributed VCS for the Python project:
http://www.python.org/dev/peps/pep-0374/

best,
Diederik

On Tue, Mar 22, 2011 at 3:47 PM, Krinkle krinklem...@gmail.com wrote:
 On March 22 2011, at 20:29 Mark Wonsil wrote:

 I haven't used git yet but after reading the excellent article that
 Rob Lanphier posted (http://hginit.com/00.html), I think I will. That
 article also explains why there wouldn't have to be as many updates to
 SVN as is done today.

 I don't think there's any doubt that git would work for Wikimedia but
 there would definitely be some workflow changes. That's probably the
 larger issue.

 Mark W.

 Another good read is http://whygitisbetterthanx.com/

 --
 Krinkle

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Topic and cathegory analyser

2011-03-03 Thread Diederik van Liere
Please elaborate.
Diederik

Sent from my iPhone

On 2011-03-03, at 16:12, Dávid Tóth 90010...@gmail.com wrote:

 Would it be useful to make a program that would create topic relations for
 each wikipedia article based on the links and the distribution of semantic
 structures?
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How users without programming skills can help

2011-02-14 Thread Diederik van Liere
I am not following this line of reasoning: how can adding guidance /
instructions on how to write a good bug report turn people away?
In a previous life, I have studied the factors that shorten the time
required to fix a big. Bugreports that contain steps to reproduce are a
significant predictor to shorten
the time to fix a bug.  You can find the paper here:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1507233

A systematic lack of replies is also an issue but this solution was not
aimed at fixing that problem.

On Mon, Feb 14, 2011 at 4:39 AM, Bryan Tong Minh
bryan.tongm...@gmail.comwrote:

 On Mon, Feb 14, 2011 at 2:46 AM, Diederik van Liere dvanli...@gmail.com
 wrote:
  So maybe we can paste these 5 steps (or something similar) in the initial
  form used to file a bugreport.
 
  This would increase the quality of bugreports and make it easier for bug
  triaging.
 
 Increase the quality perhaps, but also increase the the barrier of
 reporting bugs, and that is something that is not very good imho.
 I don't think we have a systematic problem with bad bug reports. The
 systematic problem is the lack of replies from developers.


 Bryan

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Migrating to GIT (extensions)

2011-02-14 Thread Diederik van Liere
If I am not mistaken then mercurial has better support for highly
modularized open source
software projects. You can use a  mercurial subrepository (which is
very similar to svn external and git submodule). According to their
manual:
Subrepositories is a feature that allows you to treat a collection of
repositories as a group. This will allow you to clone, commit to,
push, and pull projects and their associated libraries as a group.
see: http://mercurial.selenic.com/wiki/Subrepository
http://mercurial.selenic.com/wiki/NestedRepositories

just my 2 cents.





On Mon, Feb 14, 2011 at 2:18 AM, Siebrand Mazeland s.mazel...@xs4all.nl wrote:

 Op 14-02-11 05:01 schreef Daniel Friesen li...@nadir-seen-fire.com:

 Ohh... if the translatewiki guys are looking for a dummy for
 streamlining support for extensions based in git in preparation for a
 git migration if we do so, I'd be happy to offer monaco-port up as a
 existing extension (well, skin) using git that could be used as a test
 for streamlining git support. ;) having monaco-port get proper i18n
 while it's still not up to a level I believe I want to commit it into
 svn yet wouldn't be a bad thing.

 With regards to i18n support it is not clear to me how translatewiki staff
 would deal with 100+1 commits to different repo's every day if core and
 extensions would each be in individual repos. Can you please explain how
 Raymond would be working with Windows and Git in the proposed structure
 updating L10n for 100 extensions and MediaWiki core? How would
 translatewiki.net easily manage MediaWiki updates (diff review/commits)?

 I'm not particularly looking forward to having to jump through a huge
 series of hoops just to keep checkouts for single extensions small. If
 that is the real issue, extension distribution should get another look as
 this might indicate that ExtensionDistributor does not work as expected. I
 have currently checked out all of trunk, and for translatewiki.net we have
 a selective checkout of i18n files for extensions and we have a checkout
 for core and the installed extensions. The fragmentation and
 disorganisation/disharmony that will exist after creating 450 GIT repos
 instead of one Subversion repo as we currently have is also something I am
 not looking forward to.

 Source code management is now centralised, and correct me if I'm wrong,
 but we encourage developers to request commit access to improve visibility
 of their work and grow the community. Going distributed in the proposed
 way, would hamper that, if I'm correct. I think the relative lower
 popularity of extensions that are maintained outside of svn.wikimedia.org
 are proof of this. I am not in favour of using GIT in the proposed way. I
 think core and extensions should remain in the same repo. Checkout are for
 developers, and developers should get just all of it.

 Siebrand



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
a href=http://about.me/diederik;Check out my about.me profile!/a

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How users without programming skills can help

2011-02-14 Thread Diederik van Liere
Maybe i am not expressing myself clear, i am not talking about adding
checkboxes, radiobuttons or pulldown menus,
I am saying that we could add the following text to the textarea field
which contains the actual bugreport:
Please describe the steps to take to reproduce the problem:
What is the expected result:
What is the actual result:

If you know which version you are using or you have other information
that you think might be helpful please add it as well.
You can also describe the problem in your own words and not sticking
to the abovementioned questions.

So, again I am not saying we should add fields, we could add this text
as the default text in the textarea so people have a bit more guidance
when writing a bugreport.
No hard checks, nothing is mandatory.

On Mon, Feb 14, 2011 at 10:22 AM, Amir E. Aharoni
amir.ahar...@mail.huji.ac.il wrote:
 2011/2/14 Diederik van Liere dvanli...@gmail.com:
 I am not following this line of reasoning: how can adding guidance /
 instructions on how to write a good bug report turn people away?

 It's very simple, really: a form with a lot of fields may turn people
 away. I know that it turns me away. How many people are like me in
 this regard? That is someone that should be studied.

 I still do report bugs in Firefox, despite the many field in the form,
 but i can easily imagine people who won't.

 In a previous life, I have studied the factors that shorten the time
 required to fix a big. Bugreports that contain steps to reproduce are a
 significant predictor to shorten
 the time to fix a bug.  You can find the paper here:
 http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1507233

 That makes perfect sense, but that's the developer side side of the
 question. I'm talking about the user side.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
a href=http://about.me/diederik;Check out my about.me profile!/a

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Making code review happen in 1.18

2011-02-13 Thread Diederik van Liere
+1 to migrate to a DVCS

On Sun, Feb 13, 2011 at 8:38 PM, Mark A. Hershberger 
mhershber...@wikimedia.org wrote:

 mhershber...@wikimedia.org (Mark A. Hershberger) writes:

  The solution I'm proposing is that we branch 1.18 immediately after the
  release of the 1.17 tarball.

 I want to give credit where it is due.  Although I haven't seen him
 propose it here, this is, in fact, Robla's idea.  He and I were
 discussing what needed to happen for 1.18 and it was his idea to branch
 1.18 immediately after the release of the 1.17 tarball.

 Mark.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Roadmaps and getting and keeping devs

2011-02-13 Thread Diederik van Liere
Maybe we can make the bugathon part of the Berlin hackaton?

On Sun, Feb 13, 2011 at 4:03 PM, Ashar Voultoiz hashar+...@free.fr wrote:

 On 13/02/11 11:54, Roan Kattouw wrote:
  Bugzilla patches are another matter, yes, but I think making sure
  patches get reviewed can be a Bugmeister task. We get relatively few
  patches through Bugzilla these days anyway.

 Maybe once 1.17 is released, we should focus on the bugzilla patch queue
 and get it solved. Would probably keep us busy until June.

 Do we have any hack-a-ton planned? I can probably take a whole week
 day-offs to participate and solve them.

 --
 Ashar Voultoiz


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How users without programming skills can help

2011-02-13 Thread Diederik van Liere
I think we can draw some inspiration from Mozilla's use of Bugzilla and
particular the format they are encourage users when submitting a bugreport:

1) Steps to reproduce
2) Expected result
3) Actual result
4) Reproducible (by bugreporter): always / sometimes
5) Version information, extensions installed, database used (this
information is dependent on the skill level of the bugreporter and maybe we
can add make this information easily retrievable if it's current not easy to
determine.

So maybe we can paste these 5 steps (or something similar) in the initial
form used to file a bugreport.

This would increase the quality of bugreports and make it easier for bug
triaging.


On Sun, Feb 13, 2011 at 8:28 PM, MZMcBride z...@mzmcbride.com wrote:

 Mark A. Hershberger wrote:
  Perhaps we could recruit some people from the he.wikipedia.org community
 to
  take problems reported (via the localized interface?) and reproduce
 them or
  act as a translator between developers and bug reporters?

 There is already some infrastructure for this kind of idea:
 https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors

 I didn't know about this mailing list until a few days ago, but it's a
 start
 in building the bridge between MediaWiki development and (power-)users.

 MZMcBride



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How users without programming skills can help

2011-02-13 Thread Diederik van Liere
That's exactly my point :)
Most Firefox bugreporters are ordinary users so if they are able to report a
bug then Mediawiki users can do it as well because they are basically the
same group of Internet users.  And again, my suggestion is not hard, it's
about giving ordinary people a number of things they might want to think
about when submitting a report. This certainly will not scare people away,
in the worst case they will ignore the questions.

On Sun, Feb 13, 2011 at 11:16 PM, Daniel Friesen
li...@nadir-seen-fire.comwrote:

 Actually our users could be anyone who reads Wikipedia and notices
 there's something wrong with what MediaWiki is doing or thinks there is
 something about the ui we need to fix.

 They don't even have to be as advanced as a Firefox user... they could
 be a random human who doesn't even know they can install a browser
 other than Internet Explorer on their computer.

 If someone is already saying it's harder to report a bug to Mozilla
 about something they usually install themselves, I don't think we want
 reporting to be as hard when we have users who don't even know it's
 something they can install.

 ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

 On 11-02-13 07:53 PM, Diederik van Liere wrote:
  Dear James, Amir and fellow wikimedia devs,
 
  I understand your concern and I am not suggesting that we should force a
  user to enter all Bugzilla fields but add those 5 questions as a
 guideline
  in the free-text form. Reporters can use it when they feel uncertain what
  information we are looking for but they are not forced to stick to any
  format in particular.
 
  Additionally, I think that Mediawiki users are as technological advanced
 as
  Firefox users so I don't think this will scare somebody away. If we
 really
  want to make it easier for people to file a bug then we should add a
 simple
  wizard to guide them through the process. In particular choosing the
 right
  product and component can be quite confusing / intimidating for somebody
 new
  to Medawiki.
 
  On Sun, Feb 13, 2011 at 9:43 PM, James Alexander
  jalexan...@wikimedia.orgwrote:
 
  On 2/13/2011 8:46 PM, Diederik van Liere wrote:
  I think we can draw some inspiration from Mozilla's use of Bugzilla and
  particular the format they are encourage users when submitting a
  bugreport:
  1) Steps to reproduce
  2) Expected result
  3) Actual result
  4) Reproducible (by bugreporter): always / sometimes
  5) Version information, extensions installed, database used (this
  information is dependent on the skill level of the bugreporter and
 maybe
  we
  can add make this information easily retrievable if it's current not
 easy
  to
  determine.
 
  So maybe we can paste these 5 steps (or something similar) in the
 initial
  form used to file a bugreport.
 
  This would increase the quality of bugreports and make it easier for
 bug
  triaging.
  I can totally understand the idea behind this but I think Amir brings up
  the concern about this best:
 
  On 2/13/2011 5:56 PM, Amir E. Aharoni wrote:
  bugzilla.wikimedia.org is the tracker where i report more bugs than
  elsewhere. The second is bugzilla.mozilla.org . It's not because
  Firefox has less bugs (quite the contrary!) but because Mozilla's
  tracker requires me to fill more fields, such as steps for
  reproduction. This may encourage detailed reporting that helps
  developers solve the bugs, but it may also discourage people from
  reporting them in the first place.
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
  Gathering all that information on a bug report form could quite clearly
  make it easier to reproduce bugs and may make resolving them easier but
  I worry that the harder and/or more complicated we make the reporting
  the more likely we are to scare someone away from taking the time to
  file the bug (which we want). I'm not totally sure where the best
  balance there is.
 
  --
  James Alexander
  Associate Community Officer
  Wikimedia Foundation
  jalexan...@wikimedia.org
  +1-415-839-6885 x6716
 
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 
 

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Roadmaps and getting and keeping devs

2011-02-12 Thread Diederik van Liere
For the last months I have been going through Bugzilla and what strikes me
is that we are not using it as efficiently as other communities do. In
particular, there is little follow up to reported problems (as Leo mentioned
as well). On the short term, I think we can have a bugathon  to clean up the
buglist a little bit and re-energize some community members:

Have a bugathon where we label a lot of bugs as appropriate bugathon bugs
that need either:
a) test patch / update patch to recent svn version
a) confirmation / replication of new / unconfirmed bugs

We can provide a simple ready to go Wiki installation for people to use for
bug triaging and that way we can re-energize developers and clean up some of
the backlog of bugs.

Is this something that we should be doing?

On Sat, Feb 12, 2011 at 3:41 PM, Leo diebu...@gmail.com wrote:


 On Samstag, 12. Februar 2011 at 17:55, David Gerard wrote:
  How to grow your contributor community (and how to decimate it):
 
  http://www.codesimplicity.com/post/open-source-community-simplified/
 

 and imo, wikimedia fails at a lot of these points:

 *Quote: Respond to contributions immediately.
 This is what I think bugs me the most. There are heaps of bugs which have
 had patches attached for month or years. For newcomers, who maybe spent a
 lot of time on these, it's just rude to neither commit them nor explain why
 they can't be committed immidiately.

 *Create and document communication channels.
 This has been talked about before, and maybe it did indeed get it little
 better.

 Leo
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Roadmaps and getting and keeping devs

2011-02-12 Thread Diederik van Liere
I think one way that non technical people can help is by trying to replicate 
bugs, if they follow the steps as described in the bugreport Do you get the 
same malfunction or not. That would be a great help as it weeds out invalid 
bugreports

Sent from my iPhone

On 2011-02-12, at 17:26, phoebe ayers phoebe.w...@gmail.com wrote:

 On Sat, Feb 12, 2011 at 1:11 PM, Ryan Lane rlan...@gmail.com wrote:
 Have a bugathon where we label a lot of bugs as appropriate bugathon bugs
 that need either:
 a) test patch / update patch to recent svn version
 a) confirmation / replication of new / unconfirmed bugs
 
 We can provide a simple ready to go Wiki installation for people to use for
 bug triaging and that way we can re-energize developers and clean up some of
 the backlog of bugs.
 
 Is this something that we should be doing?
 
 
 This is something we do at hack-a-tons. I don't remember the number of
 bugs smashed at the last one, but it was a decent number.
 
 I believe the next hack-a-ton is in Berlin, soon. I'm not sure if they
 have this planned. It's apparently GLAM focused (which excludes devs
 like me), so I'd imagine not, unless the bugs targeted are GLAM
 related.
 
 - Ryan Lane
 
 I'm curious: is there a way that non-technical people can help with
 sprints like this? Documentation-building, maybe? Something else? I'm
 interested in development sprints, bugathons etc that involve both
 technical  non-technical people; I've been involved in a few and it's
 pretty fun. But I don't know how many useful ways non-programmers 
 non-developers can help.
 
 -- phoebe
 
 -- 
 * I use this address for lists; send personal messages to phoebe.ayers
 at gmail.com *
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Matching main namespace articles with associated talk page

2011-01-08 Thread Diederik van Liere
Dear dev's,


I am wondering whether the Mediawiki db contains a foreignkey
relationship between a main namespace article and the associated talk
page (if present).
Having this information would greatly simplify analytic projects to
monitor editor behaviour and understanding revert behaviour (among
other topics).

Currently, I am manually matching these two sets of pages by matching titles.
I have two questions:
1) If this foreignkey does not exist, would it be worthwhile to create it?
2) If this foreignkey does exist, what would it take to expose this in
the XML dumps?

Best regards,

Diederik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Matching main namespace articles with associated talk page

2011-01-08 Thread Diederik van Liere
Yes, manually matching is fairly simple but in the worst case you need
to iterate over n-1 talk pages (where n is the total number of talk
pages of a Wikipedia) to find the talk page that belongs to a user
page when using the dump files. Hence, if the dump file would contain
for each article a tag with talk page id then it would significantly
reduce the processing time.
Diederik

On Sat, Jan 8, 2011 at 11:39 AM, Bryan Tong Minh
bryan.tongm...@gmail.com wrote:
 On Sat, Jan 8, 2011 at 5:32 PM, John phoenixoverr...@gmail.com wrote:
 its just a matter of matching page titles, if there is a page in namespace 0
 and a page in namespace (article and article talk) with the same title they
 go together. its fairly simple

 To expand John's comment, the talk page is always the page with the
 same title, but with a namespace number 1 higher.


 Bryan

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Update on 1.17

2011-01-07 Thread Diederik van Liere
The same error is given for:
* Russian
* Japanese
* Italian
* Arabic (ar is the language code)

Best,
Diederik

2011/1/7 Bryan Tong Minh bryan.tongm...@gmail.com:
 On Fri, Jan 7, 2011 at 4:37 PM, Roan Kattouw roan.katt...@gmail.com wrote:
 2011/1/7 Bryan Tong Minh bryan.tongm...@gmail.com:
 Also FR seems to be unconditionally enabled, also on wikis that do not
 have the tables present.

 Which wikis would those be? Rob says he ran update.php so all the
 tables should be there.

 http://prototype.wikimedia.org/deployment-nl/Hoofdpagina

 Databasefout
 Er is een syntaxisfout in het databaseverzoek opgetreden. Mogelijk zit
 er een fout in de software. Het laatste verzoek aan de database was:

    (SQL-zoekopdracht verborgen)

 vanuit de functie “FlaggedRevision::newFromStable”. De database gaf de
 volgende foutmelding “1146: Table 'nlwiki.flaggedpages' doesn't exist
 (localhost)”.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-19 Thread Diederik van Liere
To continue the discussion on how to improve the performance, would it be 
possible to distribute the dumps as a 7z / gz / other format archive containing 
multiple smaller XML files. It's quite tricky to split a very large XML file in 
smaller valid XML files and if the dumping process is already parallelized then 
we do not have to cat the different XML files to one large XML file but instead 
we can distribute multiple smaller parallelized files .

best,

Diederik
On 2010-12-16, at 7:02 PM, Ariel T. Glenn wrote:

 Στις 17-12-2010, ημέρα Παρ, και ώρα 00:52 +0100, ο/η Platonides έγραψε:
 Roan Kattouw wrote:
 I'm not sure how hard this would be to achieve (you'd have to
 correlate blob parts with revisions manually using the text table;
 there might be gaps for deleted revs because ES is append-only) or how
 much it would help (my impression is ES is one of the slower parts of
 our system and reducing the number of ES hits by a factor 50 should
 help, but I may be wrong), maybe someone with more relevant knowledge
 and experience can comment on that (Tim?).
 
 Roan Kattouw (Catrope)
 
 ExternalStoreDB::fetchBlob() is already keeping the last one to optimize
 repeated accesses to the same blob (we would probably want a bigger
 cache for the dumper, though).
 On the other hand, I don't think the dumpers should be doing the store
 of textid contents in memcached (Revision::loadText) since they are
 filling them with entries useless for the users queries (having a
 different locality set), useless for themselves (since they are
 traversing the full list once) and -even assuming that the memcached can
 happily handle it and no other data is affecting by it- the network
 delay make it a non-free operation.
 
 Ariel, do you have in wikitech the step-by-step list of actions to setup
 a WMF dump server?
 I always forget about which scripts are being used and what does each of
 them do.
 Can xmldumps-phase3 be removed? I'd prefer that it uses the
 release/trunk/wmf-deployment, an old copy is a source for problems. If
 additional changes are needed (it seems unpatched), the appropiate hooks
 should be added in core.
 
 Most backups run off of trunk. The stuff I have in my branch is the
 parallel stuff for testing. 
 
 http://wikitech.wikimedia.org/view/Dumps details the various scripts.
 
 No, xmldumps-phase3 can't be removed yet.  I have yet to make the
 changes I need to that code (and I won't make them in core immediately,
 they need to be tested thoroughly first before being checked in).  Once
 I think they are ok, then I will fold them into trunk.  It will be a
 while yet. 
 
 Ariel
 
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-19 Thread Diederik van Liere
Which dump file is offered in smaller sub files?

On Sun, Dec 19, 2010 at 6:02 PM, Platonides platoni...@gmail.com wrote:
 Diederik van Liere wrote:
 To continue the discussion on how to improve the performance, would it be 
 possible to distribute the dumps as a 7z / gz / other format archive 
 containing multiple smaller XML files. It's quite tricky to split a very 
 large XML file in smaller valid XML files and if the dumping process is 
 already parallelized then we do not have to cat the different XML files to 
 one large XML file but instead we can distribute multiple smaller 
 parallelized files .

 best,

 Diederik

 That has already been done for enwiki.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
a href=http://about.me/diederik;Check out my about.me profile!/a

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


  1   2   >