Re: [Wikitech-l] WikiXMLArticleIndexer

2010-09-14 Thread Bryan Tong Minh
On Tue, Sep 14, 2010 at 1:04 AM, Jamie Morken jmor...@shaw.ca wrote:
 Hi all,

 We have a beta version of the code for reading the XML dump and
 extracting the article names with their associated images.
It is easier to download the imagelinks.sql and the page.sql dumps.
Imagelinks contains already a mapping of images used on a page, and
page can be used to map page_id to page_namespace and page_title.


Bryan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Get full-protecting article list monthly

2010-09-14 Thread Roan Kattouw
2010/9/14  zh...@york.ac.uk:
 Dear All,

 May I ask how I can get the full-protecting article lists monthly? I can
 get the current one by searching lock link. Is that some tools for this?

http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max
returns the first 500 (or 5,000 if you're a bot or sysop) pages in the
main namespace that only sysops can edit (i.e. that are fully
protected). If you're not a privileged user and only get 500 entries,
you can use the information in the query-continue tag to get the next
500.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Get full-protecting article list monthly

2010-09-14 Thread zh509
On Sep 14 2010, Roan Kattouw wrote:

2010/9/14  zh...@york.ac.uk:
 Dear All,

 May I ask how I can get the full-protecting article lists monthly? I can
 get the current one by searching lock link. Is that some tools for this?

  
 http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max
  
 returns the first 500 (or 5,000 if you're a bot or sysop) pages in the 
 main namespace that only sysops can edit (i.e. that are fully protected). 
 If you're not a privileged user and only get 500 entries, you can use the 
 information in the query-continue tag to get the next 500. Roan Kattouw 
 (Catrope)

Thanks for this!

I am looking for the change of full-protecting articles. May I get the data 
with time or date something? is that possible? Thanks,

Zeyi


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Get full-protecting article list monthly

2010-09-14 Thread John Doe
If you want we have a toolserver database query service, and generating such
data should be easy if you file a request
https://jira.toolserver.org/browse/DBQ you should be able to get the data
you need.

Δ


On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote:

 On Sep 14 2010, Roan Kattouw wrote:

 2010/9/14  zh...@york.ac.uk:
  Dear All,
 
  May I ask how I can get the full-protecting article lists monthly? I can
  get the current one by searching lock link. Is that some tools for this?
 
 
 
 http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max
  returns the first 500 (or 5,000 if you're a bot or sysop) pages in the
  main namespace that only sysops can edit (i.e. that are fully protected).
  If you're not a privileged user and only get 500 entries, you can use the
  information in the query-continue tag to get the next 500. Roan Kattouw
  (Catrope)

 Thanks for this!

 I am looking for the change of full-protecting articles. May I get the data
 with time or date something? is that possible? Thanks,

 Zeyi

 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] pagesize function problem

2010-09-14 Thread Brent Palmer
Hi,
We are having a problem with the front page of our wikipedia mirror.

The function
{{#ifexpr:{{formatnum:{{PAGESIZE:Wikipedia:Today's featured 
article/{{#time:F j, Y|R}}150|{{Wikipedia:Today's featured 
article/{{#time:F j, Y|{{Wikipedia:Today's featured 
article/{{#time:F j, Y|-1 days}}
causes the error Expression error: Unexpected  operator to appear on 
the main page.

{#time:F j, Y}} correctly returns the date in format 'June 22, 2010'.

However, {{PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, 
Y, doesn't seem to return anything.
Trying PAGESIZE with a normal page, {{PAGESIZE:Paw}}, does not return 
anything either.

Because of the failure of PAGESIZE, ifexpr becomes {{#ifexpr:150|...}} 
which I believe causes the Expression error: Unexpected  operator error.


Does anybody have any ideas what is going on with this?

Thanks,
Brent

MediaWiki http://www.mediawiki.org/   1.16.0
PHP http://www.php.net/   5.2.8 (apache2handler)
MySQL http://www.mysql.com/   5.1.48-community





___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Get full-protecting article list monthly

2010-09-14 Thread John Doe
Better yet, http://toolserver.org/~betacommand/reports/sysopprotecton.txtwhich
is updated daily,

Δ


On Tue, Sep 14, 2010 at 9:08 AM, John Doe phoenixoverr...@gmail.com wrote:

 If you want we have a toolserver database query service, and generating
 such data should be easy if you file a request
 https://jira.toolserver.org/browse/DBQ you should be able to get the data
 you need.

 Δ



 On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote:

 On Sep 14 2010, Roan Kattouw wrote:

 2010/9/14  zh...@york.ac.uk:
  Dear All,
 
  May I ask how I can get the full-protecting article lists monthly? I
 can
  get the current one by searching lock link. Is that some tools for
 this?
 
 
 
 http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max
  returns the first 500 (or 5,000 if you're a bot or sysop) pages in the
  main namespace that only sysops can edit (i.e. that are fully
 protected).
  If you're not a privileged user and only get 500 entries, you can use
 the
  information in the query-continue tag to get the next 500. Roan Kattouw
  (Catrope)

 Thanks for this!

 I am looking for the change of full-protecting articles. May I get the
 data
 with time or date something? is that possible? Thanks,

 Zeyi

 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] template rendering problems

2010-09-14 Thread Brent Palmer
Hi,
We are creating an off-line version of the wikipedia, but we continue to 
have problems getting templates to render correctly. So far, we've 
tracked down two sources of the problem.

Here is one example:
One is when tags are not closed properly. Here is an example from 
http://en.wikipedia.org/wiki/South_Africa. This is a portion of the page 
near the top that is part of the Infobox parameters...

snip
|symbol_type=Coat of arms
|image_map=South_Africa_(orthographic_projection).svg
|national_motto=''{{unicode|!ke e: ǀxarra 
ǁke}}''{{spaces|2|}}small([[ǀXam language|ǀXam]])br/Unity In Diversity
|national_anthem=[[National anthem of South Africa]]
/snip

Notice that the small tag is not closed.

On our version of the page many of the subsequent tr and td tags are 
rendered as html entities and this ruins the layout of the page. I'm not 
sure why this is works in the current on-line version but not ours. 
(Another example is the Refimprove template--check the latest change by 
Plastikspork--this makes any page with the Refimprove template broken).

We have the same extensions applied as Wikipedia. We also have the 
wgUseTidy set to true.

If you have any ideas about how to troubleshoot this one, I'd appreciate it.

Brent



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] template rendering problems-#switch

2010-09-14 Thread Brent Palmer
Hi,
We are creating an off-line version of the wikipedia, but we continue to 
have problems getting templates to render correctly. So far, we've 
tracked down several sources of the problem.

Here is another example:
In several templates I've tracked down the problem to a call to the 
#switch function.  The function appears to work in that the correct text 
is output from the function, but several of the tags following it are 
converted to html entities.  Here is a snippet of the 
Historical_populations template:


includeonly{| class=toccolours {{#ifeq: {{{state|}}} | collapsed | 
collapsible collapsed | }} style=clear: {{{align|right}}}; width: 
{{{width|15em}}}; text-align: center; border-spacing: 0; float: 
{{{align|right}}}; margin: {{ #switch: {{{align|}}} | left = 0 1em 1em 0 
| #default = 0 0 1em 1em }};
|-
! colspan={{#ifeq:{{{percentages}}}|off|2|3}} class=navbox-title | 
span style=font-size:110%;{{{title|Historical populations}}}/span
|- style=font-size: 95%;


If I remove the #switch and hard-code the default text in (0 0 1em 1em), 
it works fine.

We have the same extensions applied as Wikipedia. We also have the 
wgUseTidy set to true.

If you have any ideas about how to troubleshoot this one, I'd appreciate it.

Brent



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Get full-protecting article list monthly

2010-09-14 Thread zh509
Hi, the link seems not work. 

best,

Zeyi

On Sep 14 2010, John Doe wrote:

 Better yet, 
 http://toolserver.org/~betacommand/reports/sysopprotecton.txtwhich is 
 updated daily,

Δ


 On Tue, Sep 14, 2010 at 9:08 AM, John Doe phoenixoverr...@gmail.com 
 wrote:

 If you want we have a toolserver database query service, and generating
 such data should be easy if you file a request
 https://jira.toolserver.org/browse/DBQ you should be able to get the data
 you need.

 Δ



 On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote:

 On Sep 14 2010, Roan Kattouw wrote:

 2010/9/14  zh...@york.ac.uk:
  Dear All,
 
  May I ask how I can get the full-protecting article lists monthly? I
 can
  get the current one by searching lock link. Is that some tools for
 this?
 
 
 
  
  
 http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max
  returns the first 500 (or 5,000 if you're a bot or sysop) pages in the
  main namespace that only sysops can edit (i.e. that are fully
 protected).
  If you're not a privileged user and only get 500 entries, you can use
 the
  information in the query-continue tag to get the next 500. Roan 
  Kattouw (Catrope)

 Thanks for this!

 I am looking for the change of full-protecting articles. May I get the
 data
 with time or date something? is that possible? Thanks,

 Zeyi

 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-14 Thread Dan Nessett
On Fri, 10 Sep 2010 23:11:27 +, Dan Nessett wrote:

 We are currently attempting to refactor some specific modifications to
 the standard MW code we use (1.13.2) into an extension so we can upgrade
 to a more recent maintained version. One modification we have keeps a
 flag in the revisions table specifying that article text was imported
 from WP. This flag generates an attribution statement at the bottom of
 the article that acknowledges the import.
 
 I don't want to start a discussion about the various legal issues
 surrounding text licensing. However, assuming we must acknowledge use of
 licensed text, a legitimate technical issue is how to associate state
 with an article in a way that records the import of licensed text. I
 bring this up here because I assume we are not the only site that faces
 this issue.
 
 Some of our users want to encode the attribution information in a
 template. The problem with this approach is anyone can come along and
 remove it. That would mean the organization legally responsible for the
 site would entrust the integrity of site content to any arbitrary
 author. We may go this route, but for the sake of this discussion I
 assume such a strategy is not viable. So, the remainder of this post
 assumes we need to keep such licensing state in the db.
 
 After asking around, one suggestion was to keep the licensing state in
 the page_props table. This seems very reasonable and I would be
 interested in comments by this community on the idea. Of course, there
 has to be a way to get this state set, but it seems likely that could be
 achieved using an extension triggered when an article is edited.
 
 Since this post is already getting long, let me close by asking whether
 support for associating licensing information with articles might be
 useful to a large number of sites. If so, the perhaps it belongs in the
 core.

The discussion about whether to support license data in the database has 
settled down. There seems to be some support. So, I think the next step 
is to determine the best technical approach. Below I provide a strawman 
proposal. Note that this is only to foster discussion on technical 
requirements and approaches. I have nothing invested in the strawman.

Implementation location: In an extension

Permissions: include two new permissions - 1) addlicensedata, and 2) 
modifylicensedata. These are pretty self-explanatory. Sites that wish to 
give all users the ability to provide and modify licensing data would 
assign these permissions to everyone. Sites that wish to allow all users 
to add licensing data, but restrict those who are allowed to modify it, 
would give the first permission to everyone and the second to a limited 
group.

Database schema: Add a licensing table to the db with the following 
columns - 1) revision_or_image, 2) revision_id, 3) image_id, 4) 
content_source, 5) license_id, 6) user_id.

The first three columns identify the revision or image to which the 
licensing data is associated. I am not particularly adept with SQL, so 
there may be a better way to do this. The content_source column is a 
string that is a URL or other reference that specifies the source of the 
content under license. The license_id identifies the specific license for 
the content. The user_id identifies the user that added the licensing 
information. The user_id may be useful if a site wishes to allow someone 
who added the licensing information to delete or modify it. However, 
there are complications with this. Since IP addresses are easily spoofed, 
it would mean this entry should only be valid for logged in users.

Add a license table with the following columns - 1) license_id, 2) 
license_text, 3) license name and 4) license_version. The license_id in 
the licensing table references rows in this table.

One complication is when a page or image is reverted, the licensing table 
must be modified to reflect the current state.

Data manipulation: The extension would use suitable hooks to insert, 
modify and render licensing data. Insertion and modification would 
probably use a relevant Edit Page or Article Management hook. Rendering 
would probably use a Page Rendering Hook.

Page rendering: You probably don't want to dump licensing data directly 
onto a page. Instead, it is preferable to output a short licensing 
statement like:

Content on this page uses licensed content. For details, see licensing 
data.

The phrase licensing data would be a link to a special page that 
accesses the licensing table and displays the license data associated 
with the page.

-- 
-- Dan Nessett


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] template rendering problems

2010-09-14 Thread OQ
On Tue, Sep 14, 2010 at 10:52 AM, Brent Palmer b...@brentopalmer.com wrote:
 We have the same extensions applied as Wikipedia. We also have the
 wgUseTidy set to true.

 If you have any ideas about how to troubleshoot this one, I'd appreciate it.

Did you install Tidy or just turn the setting on?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Get full-protecting article list monthly

2010-09-14 Thread John Doe
http://toolserver.org/~betacommand/reports/sysopprotecton.txt


try that

On Tue, Sep 14, 2010 at 1:35 PM, zh...@york.ac.uk wrote:

 Hi, the link seems not work.

 best,

 Zeyi

 On Sep 14 2010, John Doe wrote:

  Better yet,
  http://toolserver.org/~betacommand/reports/sysopprotecton.txtwhichhttp://toolserver.org/%7Ebetacommand/reports/sysopprotecton.txtwhichis
  updated daily,
 
 Δ
 
 
  On Tue, Sep 14, 2010 at 9:08 AM, John Doe phoenixoverr...@gmail.com
  wrote:
 
  If you want we have a toolserver database query service, and generating
  such data should be easy if you file a request
  https://jira.toolserver.org/browse/DBQ you should be able to get the
 data
  you need.
 
  Δ
 
 
 
  On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote:
 
  On Sep 14 2010, Roan Kattouw wrote:
 
  2010/9/14  zh...@york.ac.uk:
   Dear All,
  
   May I ask how I can get the full-protecting article lists monthly? I
  can
   get the current one by searching lock link. Is that some tools for
  this?
  
  
  
 
 
 
 http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max
   returns the first 500 (or 5,000 if you're a bot or sysop) pages in
 the
   main namespace that only sysops can edit (i.e. that are fully
  protected).
   If you're not a privileged user and only get 500 entries, you can use
  the
   information in the query-continue tag to get the next 500. Roan
   Kattouw (Catrope)
 
  Thanks for this!
 
  I am looking for the change of full-protecting articles. May I get the
  data
  with time or date something? is that possible? Thanks,
 
  Zeyi
 
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
 
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] template rendering problems

2010-09-14 Thread Brent Palmer
Ahh,  good question.  But yes, we did install Tidy.  It's the windows 
version though if that makes a difference.  We are reasonably sure that 
it's working (it definitely works independently of MW) because a few 
small rendering problems were fixed, but the major ones described here 
weren't affected.  We aren't exactly sure where or when Tidy is invoked 
(after each template is parsed?).  We haven't yet tried to determine if 
it is supposed to be cleaning up something that it's not.

Thanks!
Brent


On 09/14/2010 02:37 PM, OQ wrote:
 On Tue, Sep 14, 2010 at 10:52 AM, Brent Palmerb...@brentopalmer.com  wrote:

 We have the same extensions applied as Wikipedia. We also have the
 wgUseTidy set to true.

 If you have any ideas about how to troubleshoot this one, I'd appreciate it.
  
 Did you install Tidy or just turn the setting on?



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] [Announce]: Mark Bergsma promotion to Operations Engineer Programs Manager

2010-09-14 Thread Danese Cooper
Please join me in congratulating Mark Bergsma on his promotion last week
to Operations EPM.  Mark has been a volunteer since 2004, and a paid
Network Engineer on our team since August 2006.  He's been helping us
with our extreme scaling issues (by debugging and tuning our Squid
setup, creating our Netherlands caching center, and generally developing
our network strategy) since the very beginning. For some time now Mark
has been unofficially in charge of managing the entire Ops Team's
deliverables including designing and implementing our new Primary Data
Center in Ashburn, VA, and the other Ops activities mentioned at
http://www.mediawiki.org/wiki/WMF_Projects  
http://www.mediawiki.org/wiki/WMF_Projects.  Mark has expressed an
interest in gaining some experience with people management skills as a
logical next step in his career, and to that end we will gradually add
direct reports under Mark over the next year, starting with the Data
Center Ops crew.  He will continue to report to me until we hire a
Director of Technical Operations.

I know you will do all you can to support Mark in his new role.

Danese Cooper
CTO, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Announce]: Mark Bergsma promotion to Operations Engineer Programs Manager

2010-09-14 Thread aude
On Tue, Sep 14, 2010 at 5:18 PM, Danese Cooper dcoo...@wikimedia.orgwrote:

 Please join me in congratulating Mark Bergsma on his promotion last week
 to Operations EPM.  Mark has been a volunteer since 2004, and a paid
 Network Engineer on our team since August 2006.  He's been helping us
 with our extreme scaling issues (by debugging and tuning our Squid
 setup, creating our Netherlands caching center, and generally developing
 our network strategy) since the very beginning. For some time now Mark
 has been unofficially in charge of managing the entire Ops Team's
 deliverables including designing and implementing our new Primary Data
 Center in Ashburn, VA, and the other Ops activities mentioned at
 http://www.mediawiki.org/wiki/WMF_Projects  
 http://www.mediawiki.org/wiki/WMF_Projects.  Mark has expressed an
 interest in gaining some experience with people management skills as a
 logical next step in his career, and to that end we will gradually add
 direct reports under Mark over the next year, starting with the Data
 Center Ops crew.  He will continue to report to me until we hire a
 Director of Technical Operations.


Congrats Mark!!!

-aude

PS - let me/us know when you are visiting Ashburn... we should have you and
other ops staff to a DC meetup  can treat you to beer :)



 I know you will do all you can to support Mark in his new role.

 Danese Cooper
 CTO, Wikimedia Foundation

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Proposal: reversion collapsing in edit history

2010-09-14 Thread Tgr
Roan Kattouw roan.kattouw at gmail.com writes:
 The only common factor between collapsing reversions and hiding minor
 and/or bot edits is the fact that you're hiding things from the
 history view.

Yes, it is the UI which could be reused.

 the other requires the minor/bot flags to be added to that table.

Just checking for bot user group would be, while not ideal, still acceptable. Or
is the table join required for that too costly?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l