Re: [Wikitech-l] Level to which Wikimedia wikis care about data integrity

2012-11-11 Thread emijrp
In Commons there is a bunch of broken/corrupt/missing files (most old versions of the same file). 2012/11/11 MZMcBride z...@mzmcbride.com Hi. Is there a policy or guideline about the level to which Wikimedia wikis care about data integrity? There are a few specific cases I'm talking about:

Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-21 Thread emijrp
___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT http

Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-21 Thread emijrp
___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT http://code.google.com/p/avbot

Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-17 Thread emijrp
___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT http

Re: [Wikitech-l] [Wikimedia-l] Release of educational videos under creative commons

2012-05-02 Thread emijrp
Another example of a recent video donation https://commons.wikimedia.org/wiki/Category:Files_from_the_Australian_Broadcasting_Corporation 2012/4/25 emijrp emi...@gmail.com 2012/4/24 Samuel Klein meta...@gmail.com Where's the latest thread on the Timed Media Handler progress? I am meeting

Re: [Wikitech-l] [Wikimedia-l] Release of educational videos under creative commons

2012-04-25 Thread emijrp
-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWikihttp://statmediawiki.forja.rediris.es | WikiEvidens http://code.google.com/p/wikievidens/ | WikiPapershttp://wikipapers.referata.com | WikiTeam

Re: [Wikitech-l] External link tracking

2012-04-25 Thread emijrp
. Wikipedia uses nofollow, so adding links to your website doesn't increase your pagerank, but it works fine for reaching new readers. Theses sites[2] receive a lot of traffic from Wikipedia, for sure. Regards, emijrp (Forwarding to the research mailing list.) [1] http://www.dlib.org/dlib/may07/lally

Re: [Wikitech-l] Release of educational videos under creative commons

2012-04-24 Thread emijrp
/mailman/listinfo/wikitech-l -- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWikihttp://statmediawiki.forja.rediris.es | WikiEvidens http://code.google.com/p/wikievidens

Re: [Wikitech-l] Page views

2012-04-08 Thread emijrp
file after removing javascript/json/robots.txt there are 13 left, which fits perfectly with 10,000 to 13,000 per day however 9 of these are bots!! How many of that 1000 sample log were robots (including all languages)? -- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral

Re: [Wikitech-l] Errors in Wikimedia Commons old files

2012-03-01 Thread emijrp
2012/3/1 Peter Gervai grin...@gmail.com On Thu, Mar 1, 2012 at 00:56, emijrp emi...@gmail.com wrote: I'm trying to download Wikimedia Commons, but I have found some errors. For There are still occasional errors around, would be nice to run a script against the files database... but it can

[Wikitech-l] Errors in Wikimedia Commons old files

2012-02-29 Thread emijrp
#filehistory Are you aware of this? Is this going to be fixed? Regards, emijrp ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using computer vision to categorize images at Commons

2012-02-20 Thread emijrp
Hi Maarten; I think that this is a perfect example of open question in wiki research. WikiPapers has a page for that stuff.[1] Can you add some bits there about this? I dind't know about OpenCV, I will check it for sure, and I will try to something (I'm a bot developer). Regards, emijrp [1

Re: [Wikitech-l] Using computer vision to categorize images at Commons

2012-02-20 Thread emijrp
I have found a tutorial for Python coders http://creatingwithcode.com/howto/face-detection-in-static-images-with-python/After some tests, it works fine (including René Descartes face : )). This is going to be very helpful to improve Images for biographies accuracy http://toolserver.org/~emijrp

[Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

2011-11-11 Thread emijrp
Forwarding... -- Forwarded message -- From: emijrp emi...@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-disc...@googlegroups.com Hi all; I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump

Re: [Wikitech-l] [Xmldatadumps-l] first mirror of most recent XML dumps, at C3SL in Brazil

2011-10-23 Thread emijrp
Congratulations, a big step in wiki preservation. 2011/10/13 Ariel T. Glenn ar...@wikimedia.org As the subject says, the first mirror of our XML dumps is up, hosted at C3Sl in BRazil. We're really excited about it. Details are listed on the main index page on our download server (

Re: [Wikitech-l] [Xmldatadumps-l] first mirror of most recent XML dumps, at C3SL in Brazil

2011-10-23 Thread emijrp
Some of the most recent dumps links are broken[1]. [1] http://wikipedia.c3sl.ufpr.br/jawikisource/20111018 2011/10/13 Ariel T. Glenn ar...@wikimedia.org As the subject says, the first mirror of our XML dumps is up, hosted at C3Sl in BRazil. We're really excited about it. Details are listed

Re: [Wikitech-l] [Foundation-l] Request: WMF commitment as a long term cultural archive?

2011-09-21 Thread emijrp
that Internet Archive saves XML dumps quarterly or so, but no official announcement. Also, I heard about Library of Congress wanting to mirror the dumps, but not news since a long time. L'Encyclopédie has an uptime[4] of 260 years[5] and growing. Will Wiki[pm]edia projects reach that? Regards, emijrp

Re: [Wikitech-l] page view stats redux

2011-09-18 Thread emijrp
Thanks Ariel. That is important data to preserve. 2011/9/15 Ariel T. Glenn ar...@wikimedia.org I think we finally have a complete copy from December 2007 through August 2011 of the pageview stats scrounged from various sources, now available on our dumps server. See

Re: [Wikitech-l] Picture of the Year torrents

2011-09-17 Thread emijrp
https://bugzilla.wikimedia.org/show_bug.cgi?id=30946 2011/9/12 emijrp emi...@gmail.com Hi all; I have created two torrent files for the PIcture of the Year dumps[1]. They use Wikimedia server as webseed.[2][3] Can you add them to the page? Thanks, emijrp [1] http://dumps.wikimedia.org

[Wikitech-l] sep11.wikipedia.org

2011-09-09 Thread emijrp
Hi; sep11.wikipedia.org redirects to a spam domain, probably expired and registered by other people. Can you redirect to this[1] or this[2]? Or make a simply index.html with that both links... Thanks, emijrp [1] http://dumps.wikimedia.org/sep11wiki/20071116/ [2] http://web.archive.org/web

Re: [Wikitech-l] [Foundation-l] We need to make it easy to fork and leave

2011-08-15 Thread emijrp
2001. Losing knowledge is so 48 BC. This is the most important mission human race has ever achieve. Regards, emijrp -- Krinkle ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Foundation-l] We need to make it easy to fork and leave

2011-08-13 Thread emijrp
Yes, that tool looks similar to the idea I wrote. Other approaches may be possible too. 2011/8/13 John Vandenberg jay...@gmail.com On Sat, Aug 13, 2011 at 4:53 AM, emijrp emi...@gmail.com wrote: Man, Gerard is thinking about new methods to fork (in an easy way) single articles, sets

Re: [Wikitech-l] [Foundation-l] We need to make it easy to fork and leave

2011-08-12 Thread emijrp
Man, Gerard is thinking about new methods to fork (in an easy way) single articles, sets of articles or complete wikipedias, and people reply about setting up servers/mediawiki/importing_databases and other geeky weekend parties. That is why there is no successful forks. Forking Wikipedia is

Re: [Wikitech-l] [Foundation-l] Announcement: Selected Books from Malayalam Wikisource on CD released

2011-06-12 Thread emijrp
I'm interested in uploading these CDs ISOs to Internet Archive. Are you OK with this? Your server is a bit slow, so, you will have a mirror an a bit faster. 2011/6/11 Jyothis E jyothi...@gmail.com Dear fellow Wikimedians, With great pleasure, Malayalam Wikimedia Community announced its 2011

Re: [Wikitech-l] [Foundation-l] Announcement: Selected Books from Malayalam Wikisource on CD released

2011-06-11 Thread emijrp
Creating an offline version of a wiki project is a hard work. Keep up the good work! Congratulations! : ) P.D.: downloading... 2011/6/11 Jyothis E jyothi...@gmail.com Dear fellow Wikimedians, With great pleasure, Malayalam Wikimedia Community announced its 2011 CD project Selected Books

Re: [Wikitech-l] [Foundation-l] YouTube and Creative Commons

2011-06-04 Thread emijrp
A nice script to download YouTube videos is youtube-dl[1]. Link that with a flv/mp4 - ogg converter and an uploader to Commons is trivial. [1] http://rg3.github.com/youtube-dl/ 2011/6/4 Michael Dale md...@wikimedia.org Comments inline: On Fri, Jun 3, 2011 at 4:51 PM, Brion Vibber

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
Hi James; download.wikimedia.org is available again, so, you can download that file from http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-pages-articles.xml.bz26.2 GB. Regards, emijrp 2010/12/14 James Linden kodekr...@gmail.com On Mon, Dec 13, 2010 at 7:09 PM, Michael Gurlitz

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
Hi Monica; You dump is this one, with date 2010-03-12:[1][2] a3a5ee062abc16a79d111273d4a1a99a enwiki-20100312-pages-articles.xml.bz2 There are some old English Wikipedia dumps and md5sum files in a directory called archive[3]. Regards, emijrp [1] http://download.wikimedia.org/archive/enwiki

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
All? The 2006 one too? 2010/12/16 Ariel T. Glenn ar...@wikimedia.org The dumps in the archive are there because they are incomplete, by the way. Ariel Στις 16-12-2010, ημέρα Πεμ, και ώρα 16:50 +0100, ο/η emijrp έγραψε: Hi Monica; You dump is this one, with date 2010-03-12:[1][2

Re: [Wikitech-l] dataset1, xml dumps

2010-12-16 Thread emijrp
Have you checked the md5sum? 2010/12/16 Gabriel Weinberg y...@alum.mit.edu Ariel T. Glenn ariel at wikimedia.org writes: We now have a copy of the dumps on a backup host. Although we are still resolving hardware issues on the XML dumps server, we think it is safe enough to serve the

Re: [Wikitech-l] dataset1, xml dumps

2010-12-16 Thread emijrp
md5sum. Can anyone else confirm? On Thu, Dec 16, 2010 at 5:41 PM, emijrp emi...@gmail.com wrote: Have you checked the md5sum? 2010/12/16 Gabriel Weinberg y...@alum.mit.edu Ariel T. Glenn ariel at wikimedia.org writes: We now have a copy of the dumps on a backup host

Re: [Wikitech-l] [Xmldatadumps-l] dataset1, xml dumps

2010-12-15 Thread emijrp
Good work. 2010/12/15 Ariel T. Glenn ar...@wikimedia.org We now have a copy of the dumps on a backup host. Although we are still resolving hardware issues on the XML dumps server, we think it is safe enough to serve the existing dumps read-only. DNS was updated to that effect already;

Re: [Wikitech-l] dataset1, xml dumps

2010-12-14 Thread emijrp
Thanks. Double good news: http://lists.wikimedia.org/pipermail/foundation-l/2010-December/063088.html 2010/12/14 Ariel T. Glenn ar...@wikimedia.org For folks who have not been following the saga on http://wikitech.wikimedia.org/view/Dataset1 we were able to get the raid array back in service

Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread emijrp
be nice. Regards, emijrp 2010/12/13 Monica shu monicashu...@gmail.com Hi all, I have downloaded a dump several month ago. By accidentally, I lost the version info of this dump, so I don't know when this dump was generated. Is there any place that list out info about the past dumps(such as size

Re: [Wikitech-l] Looking for a mediawiki.org dump

2010-12-11 Thread emijrp
) On 11 December 2010 10:34, emijrp emi...@gmail.com wrote: I have this one: mediawikiwiki-20100808-pages-meta-history.xml.7z (37 MB). I can upload it to MegaUpload if needed. 2010/12/6 Andrew Dunbar hippytr...@gmail.com Could anybody help me locate a dump of mediawiki.org while

Re: [Wikitech-l] Looking for a mediawiki.org dump

2010-12-10 Thread emijrp
I have this one: mediawikiwiki-20100808-pages-meta-history.xml.7z (37 MB). I can upload it to MegaUpload if needed. 2010/12/6 Andrew Dunbar hippytr...@gmail.com Could anybody help me locate a dump of mediawiki.org while the dump server is broken please? I only need current revisions. Thanks

Re: [Wikitech-l] wikipedia dumps

2010-12-10 Thread emijrp
2010/12/10 James Linden kodekr...@gmail.com This may or may not be appropriate to this list -- this is where I found most of the discussions on the matter, so posting here. From reading the past couple of weeks of messages, I surmise that there isn't a way to get a current data dump (for

Re: [Wikitech-l] alternative way to get wikipedia dump while server is down

2010-11-28 Thread emijrp
What are the ISO codes? ro and ka? I have kawiktionary-20100807-pages-meta-history.xml.7z (1.3 MB) and rowiktionary-20100810-pages-meta-history.xml.7z (10.1 MB). Very tiny. 2010/11/28 Andrew Dunbar hippytr...@gmail.com On 28 November 2010 02:42, Jeff Kubina jeff.kub...@gmail.com wrote: I

Re: [Wikitech-l] alternative way to get wikipedia dump while server is down

2010-11-26 Thread emijrp
Crossposting. This dump is in /mnt/user-store/dump or dumps, on Toolserver. If the admins don't see any problem, it may be put available for download (~30GB). Regards, emijrp 2010/11/25 Oliver Schmidt schmidt...@email.ulster.ac.uk Hello alltogether, is there any alternative way to get hands

Re: [Wikitech-l] XML dumps stopped, possible fs/disk issues on dump server under investigation

2010-11-22 Thread emijrp
You can follow the updates here http://wikitech.wikimedia.org/history/Dataset1 2010/11/21 masti mast...@gmail.com On 11/10/2010 06:44 AM, Ariel T. Glenn wrote: We noticed a kernel panic message and stack trace in the logs on the server that servers XML dumps. The web server that provides

Re: [Wikitech-l] Is that download.wikimedia.org server is down?

2010-11-11 Thread emijrp
The dump generating process is halted. Also, the official XML download page is offline, until they fix the hardware. I don't know if there are mirrors. I don't think so. 2010/11/11 Billy Chan waterfall...@gmail.com Hi Robin, Thanks for your link. Do u know where i can download the xml dumps

Re: [Wikitech-l] Is that download.wikimedia.org server is down?

2010-11-11 Thread emijrp
/Wikipedia_Archive 2010/11/11 emijrp emi...@gmail.com The dump generating process is halted. Also, the official XML download page is offline, until they fix the hardware. I don't know if there are mirrors. I don't think so. 2010/11/11 Billy Chan waterfall...@gmail.com Hi Robin, Thanks for your link

Re: [Wikitech-l] Is that download.wikimedia.org server is down?

2010-11-11 Thread emijrp
Sorry. Where I said from August 2010, I mean of August 2010. I have only one .7z for every wiki of WMF. 2010/11/11 emijrp emi...@gmail.com There are some old dumps in Internet Archive,[1] but I guess you are interested in the most recent ones. Also, I have a copy of all the pages-meta

Re: [Wikitech-l] [Xmldatadumps-l] XML dumps stopped, possible fs/disk issues on dump server under investigation

2010-11-10 Thread emijrp
What data is in risk? 2010/11/10 Ariel T. Glenn ar...@wikimedia.org The server refused to come up on reboot; raid errors. The backplane is suspect. A ticket is being opened with the vendor. The host will remain offline until we have good information about how to resolve the problem or we

Re: [Wikitech-l] [Xmldatadumps-l] dataset1 maintenance Sat Oct 1 (dumps unavailable)

2010-10-04 Thread emijrp
So, will English Wikipedia dumps be created with this new method from now? 2010/10/2 Ariel T. Glenn ar...@wikimedia.org The server that hosts XML dumps was moved this morning and all maintenance completed. The dumps for dewiki, arwiki, srwiki and ptwikiquote were restarted from the

Re: [Wikitech-l] list of things to do for image dumps

2010-09-18 Thread emijrp
Thanks! : ) 2010/9/17 Lars Aronsson l...@aronsson.se On September 10, emijrp wrote: Hi Lars, are you going to upload more logs to Internet Archive? No, I can't. I have not downloaded more recent logs. I only uploaded what was on my disk, because I needed to free some space. Domas

Re: [Wikitech-l] list of things to do for image dumps

2010-09-10 Thread emijrp
Hi Lars, are you going to upload more logs to Internet Archive? Domas website only shows the last 3 (?) months. I think that there are many of these files at Toolserver, but we must preserve this raw data in another secure (for posterity) place. 2010/9/10 Lars Aronsson l...@aronsson.se On

Re: [Wikitech-l] [Foundation-l] Visual impairment

2010-05-16 Thread emijrp
Perhaps, we can offer two captchas. First, the current one, and a link with this label if you can't read this captcha, try this one and a link to the sound reCAPTCHA. Requesting an account to admins is not a good solution (perhaps as a third option). Regards, emijrp 2010/5/16 Christopher Grant

Re: [Wikitech-l] [Foundation-l] Visual impairment

2010-05-16 Thread emijrp
Interesting thread in Jimbo's talk page[1] from June 2008. [1] http://en.wikipedia.org/wiki/User_talk:Jimbo_Wales/Archive_37#Wikipedia_and_Captcha 2010/5/16 Chad innocentkil...@gmail.com On Sun, May 16, 2010 at 3:04 AM, Christopher Grant chrisgrantm...@gmail.com wrote: On Sun, May 16, 2010

[Wikitech-l] Visual impairment

2010-05-15 Thread emijrp
Hi all; Solving captcha during registration is mandatory. Can this be replaced with a sound captcha for visual impairment people? It is a suggestion to the usability project too. Thanks. Regards, emijrp ___ Wikitech-l mailing list Wikitech-l