---------- Forwarded message ---------- From: Ariel Glenn WMF <ar...@wikimedia.org> Date: Fri, Apr 22, 2016 at 9:21 PM Subject: Re: [Xmldatadumps-l] Failed dumps To: InfoSports <a...@infosports.com> Cc: gnosygnu <gnosy...@gmail.com>, Ariel Glenn WMF <ar...@wikimedia.org>
I've been out ill this week. First day back today. I'm tracking this issue here, including any reruns: https://phabricator.wikimedia.org/T133416 Ariel On Fri, Apr 15, 2016 at 6:26 PM, InfoSports <a...@infosports.com> wrote: > I noticed the decreased size as well. > > Also, there are many duplicates in the download. Example article titles… > > Rainbow, California > Murrieta, California > Fallbrook, California > Temecula, California > Wildomar, California > Sedeco Hills, California > Palomar Mountain > ...and many more > > > Please re-run the process. There are too many errors in this one to be > usable. > > Thank you in advance. > > -Al > > > > On Apr 14, 2016, at 8:56 PM, gnosygnu <gnosy...@gmail.com> wrote: > > > > Hi. I think there may still be problems with the 2016-04-07 English > Wikipedia dump. It's missing many articles in the Module namespace. > > > > Here are some details: > > * I downloaded > https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-pages-articles.xml.bz2 > . I got an XML file that was 10.8 GB (i.e.: it does not look severely > truncated) > > * I ran the following grep commands. Note that Module:Hatnote is blank. > I ran the last grep to show that the criteria should be correct. > > root~> grep "<title>Earth</title>" /home/root/xowa/wiki/ > en.wikipedia.org/enwiki-latest-pages-articles.xml > > <title>Earth</title> > > root~> grep "<title>Template:About</title>" /home/root/xowa/wiki/ > en.wikipedia.org/enwiki-latest-pages-articles.xml > > <title>Template:About</title> > > root~> grep "<title>Module:Hatnote</title>" /home/root/xowa/wiki/ > en.wikipedia.org/enwiki-latest-pages-articles.xml > > root~> grep "<title>Module:" /home/root/xowa/wiki/ > en.wikipedia.org/enwiki-latest-pages-articles.xml > > <title>Module:Location map/data/Croatia/doc</title> > > <title>Module:Location map/data/USA Alabama/doc</title> > > ... > > * The following Modules appear to be missing in the 2016-04-07 dump > > Module:Use_mdy_dates > > Module:Pp-move-indef > > Module:Protection_banner > > Module:Unsubst > > * By my count, there were 2,970 articles in the Module namespace in the > 2016-03-05 dump. In contrast, there are only 652 in the 2016-04-07 dump. > > > > Let me know if you need any other information. I believe that the above > can be verified by anyone else, but I'd be happy to provide more detail > > > > Thanks. > > > > > > > > > > On Thu, Apr 14, 2016 at 8:49 AM, Ariel Glenn WMF <ar...@wikimedia.org> > wrote: > > It hasn't failed. It's still running but the jobs that previously > failed have been left in that status until they get rerun. That's standard > behavior. Don't worry, be happy! :-) > > > > Ariel > > > > On Thu, Apr 14, 2016 at 2:15 PM, Nicolas Vervelle <nverve...@gmail.com> > wrote: > > But at least, pages-articles worked, so it's ok for me. > > > > On Thu, Apr 14, 2016 at 1:13 PM, Nicolas Vervelle <nverve...@gmail.com> > wrote: > > Well, enwiki failed again today... > > > > On Wed, Apr 13, 2016 at 4:37 PM, Ariel Glenn WMF <ar...@wikimedia.org> > wrote: > > You are right. Two jobs were competing for enwiki since I allocated one > more lousy core to the host that runs them. I've fixed the config to avoid > that. It will resume in a few hours with cron. > > > > Ariel > > > > On Wed, Apr 13, 2016 at 4:37 PM, Nicolas Vervelle <nverve...@gmail.com> > wrote: > > Thanks Ariel, > > > > It seems to have worked for some dumps (frwiki for example), but other > dumps are still failing (enwiki for example) > > > > Nico > > > > On Tue, Apr 12, 2016 at 11:04 AM, Ariel Glenn WMF <ar...@wikimedia.org> > wrote: > > Hi Nicolas, > > > > These will be picked up on reruns, which will happen over the next day > or so. The failure was caused by an obscure hhvm bug which only triggers > under certain circumstances. For more information about that, see: > https://phabricator.wikimedia.org/T94277 > > > > This morning I did jobs cleanup, switched the dump jobs to use php5 > again and the dumps have restarted. > > > > Ariel > > > > On Tue, Apr 12, 2016 at 11:25 AM, Nicolas Vervelle <nverve...@gmail.com> > wrote: > > Hi, > > > > Is anyone working on the failed dumps for April ? (enwiki, frwiki, > ruwiki, itwiki, ...) > > > > Nico > > > > _______________________________________________ > > Xmldatadumps-l mailing list > > Xmldatadumps-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l > > > > > > > > > > > > > > > > > > _______________________________________________ > > Xmldatadumps-l mailing list > > Xmldatadumps-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l > > > > > > _______________________________________________ > > Xmldatadumps-l mailing list > > Xmldatadumps-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l > >
_______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l