Re: [Wikitech-l] XML dumps

2015-11-17 Thread Matthew Flaschen



On 05/28/2015 07:52 PM, Lars Aronsson wrote:
> With proper release management, it

should be possible to run the old version of the process
until the new version has been tested, first on some smaller
wikis, and gradually on the larger ones.


I understand your frustration; however release management was not the 
issue in this case.  According to Ariel Glenn on the task 
(https://phabricator.wikimedia.org/T98585#1284441), "It's not a new 
leak, it's just that the largest single stubs file in our dumps runs is 
now produced by wikidata!".


I.E. it was caused by changes to the input data (i.e. our projects), not 
by changes to the code.


Matt Flaschen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] XML dumps

2015-05-29 Thread Gerard Meijssen
hear hear
Gerard

On 29 May 2015 at 01:52, Lars Aronsson  wrote:

> The XML database dumps are missing all through May, apparently
> because of a memory leak that is being worked on, as described
> here,
> https://phabricator.wikimedia.org/T98585
>
> However, that information doesn't reach the person who wants to
> download a fresh dump and looks here,
> http://dumps.wikimedia.org/backup-index.html
>
> I think it should be possible to make a regular schedule for
> when these dumps should be produced, e.g. once each month or
> once every second month, and treat any delay as a bug. The
> process to produce them has been halted by errors many times
> in the past, and even when it runs as intended the interval
> is unpredictable. Now when there is a bug, all dumps are
> halted, i.e. much delayed. For a user of the dumps, this is
> extremely frustrating. With proper release management, it
> should be possible to run the old version of the process
> until the new version has been tested, first on some smaller
> wikis, and gradually on the larger ones.
>
>
> --
>   Lars Aronsson (l...@aronsson.se)
>   Aronsson Datateknik - http://aronsson.se
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] XML dumps

2015-05-28 Thread Lars Aronsson

The XML database dumps are missing all through May, apparently
because of a memory leak that is being worked on, as described
here,
https://phabricator.wikimedia.org/T98585

However, that information doesn't reach the person who wants to
download a fresh dump and looks here,
http://dumps.wikimedia.org/backup-index.html

I think it should be possible to make a regular schedule for
when these dumps should be produced, e.g. once each month or
once every second month, and treat any delay as a bug. The
process to produce them has been halted by errors many times
in the past, and even when it runs as intended the interval
is unpredictable. Now when there is a bug, all dumps are
halted, i.e. much delayed. For a user of the dumps, this is
extremely frustrating. With proper release management, it
should be possible to run the old version of the process
until the new version has been tested, first on some smaller
wikis, and gradually on the larger ones.


--
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-21 Thread emijrp
Create a script that makes a request to Special:Export using this category
as feed
https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_deletion

More info https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export

2012/5/21 Mike Dupont 

> Well I whould be happy for items like this :
> http://en.wikipedia.org/wiki/Template:Db-a7
> would it be possible to extract them easily?
> mike
>
> On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn 
> wrote:
> > There's a few other reasons articles get deleted: copyright issues,
> > personal identifying data, etc.  This makes maintaning the sort of
> > mirror you propose problematic, although a similar mirror is here:
> > http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
> >
> > The dumps contain only data publically available at the time of the run,
> > without deleted data.
> >
> > The articles aren't permanently deleted of course.  The revisions texts
> > live on in the database, so a query on toolserver, for example, could be
> > used to get at them, but that would need to be for research purposes.
> >
> > Ariel
> >
> > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont έγραψε:
> >> Hi,
> >> I am thinking about how to collect articles deleted based on the "not
> >> notable" criteria,
> >> is there any way we can extract them from the mysql binlogs? how are
> >> these mirrors working? I would be interested in setting up a mirror of
> >> deleted data, at least that which is not spam/vandalism based on tags.
> >> mike
> >>
> >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn 
> wrote:
> >> > We now have three mirror sites, yay!  The full list is linked to from
> >> > http://dumps.wikimedia.org/ and is also available at
> >> >
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
> >> >
> >> > Summarizing, we have:
> >> >
> >> > C3L (Brazil) with the last 5 good known dumps,
> >> > Masaryk University (Czech Republic) with the last 5 known good dumps,
> >> > Your.org (USA) with the complete archive of dumps, and
> >> >
> >> > for the latest version of uploaded media, Your.org with http/ftp/rsync
> >> > access.
> >> >
> >> > Thanks to Carlos, Kevin and Yenya respectively at the above sites for
> >> > volunteering space, time and effort to make this happen.
> >> >
> >> > As people noticed earlier, a series of media tarballs per-project
> >> > (excluding commons) is being generated.  As soon as the first run of
> >> > these is complete we'll announce its location and start generating
> them
> >> > on a semi-regular basis.
> >> >
> >> > As we've been getting the bugs out of the mirroring setup, it is
> getting
> >> > easier to add new locations.  Know anyone interested?  Please let us
> >> > know; we would love to have them.
> >> >
> >> > Ariel
> >> >
> >> >
> >> > ___
> >> > Wikitech-l mailing list
> >> > Wikitech-l@lists.wikimedia.org
> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> >>
> >>
> >
> >
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> James Michael DuPont
> Member of Free Libre Open Source Software Kosova http://flossk.org
> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT  |
StatMediaWiki
| WikiEvidens  |
WikiPapers
| WikiTeam 
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-21 Thread Mike Dupont
Well I whould be happy for items like this :
http://en.wikipedia.org/wiki/Template:Db-a7
would it be possible to extract them easily?
mike

On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn  wrote:
> There's a few other reasons articles get deleted: copyright issues,
> personal identifying data, etc.  This makes maintaning the sort of
> mirror you propose problematic, although a similar mirror is here:
> http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
>
> The dumps contain only data publically available at the time of the run,
> without deleted data.
>
> The articles aren't permanently deleted of course.  The revisions texts
> live on in the database, so a query on toolserver, for example, could be
> used to get at them, but that would need to be for research purposes.
>
> Ariel
>
> Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont έγραψε:
>> Hi,
>> I am thinking about how to collect articles deleted based on the "not
>> notable" criteria,
>> is there any way we can extract them from the mysql binlogs? how are
>> these mirrors working? I would be interested in setting up a mirror of
>> deleted data, at least that which is not spam/vandalism based on tags.
>> mike
>>
>> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn  wrote:
>> > We now have three mirror sites, yay!  The full list is linked to from
>> > http://dumps.wikimedia.org/ and is also available at
>> > http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
>> >
>> > Summarizing, we have:
>> >
>> > C3L (Brazil) with the last 5 good known dumps,
>> > Masaryk University (Czech Republic) with the last 5 known good dumps,
>> > Your.org (USA) with the complete archive of dumps, and
>> >
>> > for the latest version of uploaded media, Your.org with http/ftp/rsync
>> > access.
>> >
>> > Thanks to Carlos, Kevin and Yenya respectively at the above sites for
>> > volunteering space, time and effort to make this happen.
>> >
>> > As people noticed earlier, a series of media tarballs per-project
>> > (excluding commons) is being generated.  As soon as the first run of
>> > these is complete we'll announce its location and start generating them
>> > on a semi-regular basis.
>> >
>> > As we've been getting the bugs out of the mirroring setup, it is getting
>> > easier to add new locations.  Know anyone interested?  Please let us
>> > know; we would love to have them.
>> >
>> > Ariel
>> >
>> >
>> > ___
>> > Wikitech-l mailing list
>> > Wikitech-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>>
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
James Michael DuPont
Member of Free Libre Open Source Software Kosova http://flossk.org
Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-21 Thread emijrp
You can create a script that uses Special:Export to export all articles in
the deletion categories just before they are deleted.

Then import them into your "Deletionpedia".

2012/5/17 Mike Dupont 

> Hi,
> I am thinking about how to collect articles deleted based on the "not
> notable" criteria,
> is there any way we can extract them from the mysql binlogs? how are
> these mirrors working? I would be interested in setting up a mirror of
> deleted data, at least that which is not spam/vandalism based on tags.
> mike
>
> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn 
> wrote:
> > We now have three mirror sites, yay!  The full list is linked to from
> > http://dumps.wikimedia.org/ and is also available at
> >
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
> >
> > Summarizing, we have:
> >
> > C3L (Brazil) with the last 5 good known dumps,
> > Masaryk University (Czech Republic) with the last 5 known good dumps,
> > Your.org (USA) with the complete archive of dumps, and
> >
> > for the latest version of uploaded media, Your.org with http/ftp/rsync
> > access.
> >
> > Thanks to Carlos, Kevin and Yenya respectively at the above sites for
> > volunteering space, time and effort to make this happen.
> >
> > As people noticed earlier, a series of media tarballs per-project
> > (excluding commons) is being generated.  As soon as the first run of
> > these is complete we'll announce its location and start generating them
> > on a semi-regular basis.
> >
> > As we've been getting the bugs out of the mirroring setup, it is getting
> > easier to add new locations.  Know anyone interested?  Please let us
> > know; we would love to have them.
> >
> > Ariel
> >
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> James Michael DuPont
> Member of Free Libre Open Source Software Kosova http://flossk.org
> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT  |
StatMediaWiki
| WikiEvidens  |
WikiPapers
| WikiTeam 
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-17 Thread Ariel T. Glenn
There's a few other reasons articles get deleted: copyright issues,
personal identifying data, etc.  This makes maintaning the sort of
mirror you propose problematic, although a similar mirror is here:
http://deletionpedia.dbatley.com/w/index.php?title=Main_Page

The dumps contain only data publically available at the time of the run,
without deleted data.

The articles aren't permanently deleted of course.  The revisions texts
live on in the database, so a query on toolserver, for example, could be
used to get at them, but that would need to be for research purposes.

Ariel

Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont έγραψε:
> Hi,
> I am thinking about how to collect articles deleted based on the "not
> notable" criteria,
> is there any way we can extract them from the mysql binlogs? how are
> these mirrors working? I would be interested in setting up a mirror of
> deleted data, at least that which is not spam/vandalism based on tags.
> mike
> 
> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn  wrote:
> > We now have three mirror sites, yay!  The full list is linked to from
> > http://dumps.wikimedia.org/ and is also available at
> > http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
> >
> > Summarizing, we have:
> >
> > C3L (Brazil) with the last 5 good known dumps,
> > Masaryk University (Czech Republic) with the last 5 known good dumps,
> > Your.org (USA) with the complete archive of dumps, and
> >
> > for the latest version of uploaded media, Your.org with http/ftp/rsync
> > access.
> >
> > Thanks to Carlos, Kevin and Yenya respectively at the above sites for
> > volunteering space, time and effort to make this happen.
> >
> > As people noticed earlier, a series of media tarballs per-project
> > (excluding commons) is being generated.  As soon as the first run of
> > these is complete we'll announce its location and start generating them
> > on a semi-regular basis.
> >
> > As we've been getting the bugs out of the mirroring setup, it is getting
> > easier to add new locations.  Know anyone interested?  Please let us
> > know; we would love to have them.
> >
> > Ariel
> >
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> 
> 
> 



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-17 Thread Mike Dupont
Hi,
I am thinking about how to collect articles deleted based on the "not
notable" criteria,
is there any way we can extract them from the mysql binlogs? how are
these mirrors working? I would be interested in setting up a mirror of
deleted data, at least that which is not spam/vandalism based on tags.
mike

On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn  wrote:
> We now have three mirror sites, yay!  The full list is linked to from
> http://dumps.wikimedia.org/ and is also available at
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
>
> Summarizing, we have:
>
> C3L (Brazil) with the last 5 good known dumps,
> Masaryk University (Czech Republic) with the last 5 known good dumps,
> Your.org (USA) with the complete archive of dumps, and
>
> for the latest version of uploaded media, Your.org with http/ftp/rsync
> access.
>
> Thanks to Carlos, Kevin and Yenya respectively at the above sites for
> volunteering space, time and effort to make this happen.
>
> As people noticed earlier, a series of media tarballs per-project
> (excluding commons) is being generated.  As soon as the first run of
> these is complete we'll announce its location and start generating them
> on a semi-regular basis.
>
> As we've been getting the bugs out of the mirroring setup, it is getting
> easier to add new locations.  Know anyone interested?  Please let us
> know; we would love to have them.
>
> Ariel
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
James Michael DuPont
Member of Free Libre Open Source Software Kosova http://flossk.org
Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-17 Thread emijrp
Good work. We are approaching finally to an indestructible corpus of
knowledge.

2012/5/17 Ariel T. Glenn 

> We now have three mirror sites, yay!  The full list is linked to from
> http://dumps.wikimedia.org/ and is also available at
>
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
>
> Summarizing, we have:
>
> C3L (Brazil) with the last 5 good known dumps,
> Masaryk University (Czech Republic) with the last 5 known good dumps,
> Your.org (USA) with the complete archive of dumps, and
>
> for the latest version of uploaded media, Your.org with http/ftp/rsync
> access.
>
> Thanks to Carlos, Kevin and Yenya respectively at the above sites for
> volunteering space, time and effort to make this happen.
>
> As people noticed earlier, a series of media tarballs per-project
> (excluding commons) is being generated.  As soon as the first run of
> these is complete we'll announce its location and start generating them
> on a semi-regular basis.
>
> As we've been getting the bugs out of the mirroring setup, it is getting
> easier to add new locations.  Know anyone interested?  Please let us
> know; we would love to have them.
>
> Ariel
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT  |
StatMediaWiki
| WikiEvidens  |
WikiPapers
| WikiTeam 
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] XML dumps/Media mirrors update

2012-05-17 Thread Ariel T. Glenn
We now have three mirror sites, yay!  The full list is linked to from
http://dumps.wikimedia.org/ and is also available at
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors

Summarizing, we have: 

C3L (Brazil) with the last 5 good known dumps, 
Masaryk University (Czech Republic) with the last 5 known good dumps, 
Your.org (USA) with the complete archive of dumps, and

for the latest version of uploaded media, Your.org with http/ftp/rsync
access.

Thanks to Carlos, Kevin and Yenya respectively at the above sites for
volunteering space, time and effort to make this happen. 

As people noticed earlier, a series of media tarballs per-project
(excluding commons) is being generated.  As soon as the first run of
these is complete we'll announce its location and start generating them
on a semi-regular basis. 

As we've been getting the bugs out of the mirroring setup, it is getting
easier to add new locations.  Know anyone interested?  Please let us
know; we would love to have them.

Ariel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XML dumps stopped, possible fs/disk issues on dump server under investigation

2010-11-22 Thread emijrp
You can follow the updates here
http://wikitech.wikimedia.org/history/Dataset1

2010/11/21 masti 

> On 11/10/2010 06:44 AM, Ariel T. Glenn wrote:
> > We noticed a kernel panic message and stack trace in the logs on the
> > server that servers XML dumps.  The web server that provides access to
> > these files is temporarily out of commission; we hope to have it back on
> > line in 12 hours or less.  Dumps themselves have been suspended while we
> > investigate.  I hope to have an update on this tomorrow as well.
> >
> > Ariel
> >
>
> any news/outlook when the new dumps will be available?
>
> masti
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XML dumps stopped, possible fs/disk issues on dump server under investigation

2010-11-20 Thread masti
On 11/10/2010 06:44 AM, Ariel T. Glenn wrote:
> We noticed a kernel panic message and stack trace in the logs on the
> server that servers XML dumps.  The web server that provides access to
> these files is temporarily out of commission; we hope to have it back on
> line in 12 hours or less.  Dumps themselves have been suspended while we
> investigate.  I hope to have an update on this tomorrow as well.
>
> Ariel
>

any news/outlook when the new dumps will be available?

masti

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] XML dumps stopped, possible fs/disk issues on dump server under investigation

2010-11-09 Thread Ariel T. Glenn
We noticed a kernel panic message and stack trace in the logs on the
server that servers XML dumps.  The web server that provides access to
these files is temporarily out of commission; we hope to have it back on
line in 12 hours or less.  Dumps themselves have been suspended while we
investigate.  I hope to have an update on this tomorrow as well.

Ariel



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l