Hi. I think there may still be problems with the 2016-04-07 English Wikipedia dump. It's missing many articles in the Module namespace.
Here are some details: * I downloaded https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-pages-articles.xml.bz2 . I got an XML file that was 10.8 GB (i.e.: it does not look severely truncated) * I ran the following grep commands. Note that Module:Hatnote is blank. I ran the last grep to show that the criteria should be correct. root~> grep "<title>Earth</title>" /home/root/xowa/wiki/ en.wikipedia.org/enwiki-latest-pages-articles.xml <title>Earth</title> root~> grep "<title>Template:About</title>" /home/root/xowa/wiki/ en.wikipedia.org/enwiki-latest-pages-articles.xml <title>Template:About</title> root~> grep "<title>Module:Hatnote</title>" /home/root/xowa/wiki/ en.wikipedia.org/enwiki-latest-pages-articles.xml root~> grep "<title>Module:" /home/root/xowa/wiki/ en.wikipedia.org/enwiki-latest-pages-articles.xml <title>Module:Location map/data/Croatia/doc</title> <title>Module:Location map/data/USA Alabama/doc</title> ... * The following Modules appear to be missing in the 2016-04-07 dump Module:Use_mdy_dates Module:Pp-move-indef Module:Protection_banner Module:Unsubst * By my count, there were 2,970 articles in the Module namespace in the 2016-03-05 dump. In contrast, there are only 652 in the 2016-04-07 dump. Let me know if you need any other information. I believe that the above can be verified by anyone else, but I'd be happy to provide more detail Thanks. On Thu, Apr 14, 2016 at 8:49 AM, Ariel Glenn WMF <ar...@wikimedia.org> wrote: > It hasn't failed. It's still running but the jobs that previously failed > have been left in that status until they get rerun. That's standard > behavior. Don't worry, be happy! :-) > > Ariel > > On Thu, Apr 14, 2016 at 2:15 PM, Nicolas Vervelle <nverve...@gmail.com> > wrote: > >> But at least, pages-articles worked, so it's ok for me. >> >> On Thu, Apr 14, 2016 at 1:13 PM, Nicolas Vervelle <nverve...@gmail.com> >> wrote: >> >>> Well, enwiki failed again today... >>> >>> On Wed, Apr 13, 2016 at 4:37 PM, Ariel Glenn WMF <ar...@wikimedia.org> >>> wrote: >>> >>>> You are right. Two jobs were competing for enwiki since I allocated one >>>> more lousy core to the host that runs them. I've fixed the config to avoid >>>> that. It will resume in a few hours with cron. >>>> >>>> Ariel >>>> >>>> On Wed, Apr 13, 2016 at 4:37 PM, Nicolas Vervelle <nverve...@gmail.com> >>>> wrote: >>>> >>>>> Thanks Ariel, >>>>> >>>>> It seems to have worked for some dumps (frwiki for example), but other >>>>> dumps are still failing (enwiki for example) >>>>> >>>>> Nico >>>>> >>>>> On Tue, Apr 12, 2016 at 11:04 AM, Ariel Glenn WMF <ar...@wikimedia.org >>>>> > wrote: >>>>> >>>>>> Hi Nicolas, >>>>>> >>>>>> These will be picked up on reruns, which will happen over the next >>>>>> day or so. The failure was caused by an obscure hhvm bug which only >>>>>> triggers under certain circumstances. For more information about that, >>>>>> see: https://phabricator.wikimedia.org/T94277 >>>>>> >>>>>> This morning I did jobs cleanup, switched the dump jobs to use php5 >>>>>> again and the dumps have restarted. >>>>>> >>>>>> Ariel >>>>>> >>>>>> On Tue, Apr 12, 2016 at 11:25 AM, Nicolas Vervelle < >>>>>> nverve...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Is anyone working on the failed dumps for April ? (enwiki, frwiki, >>>>>>> ruwiki, itwiki, ...) >>>>>>> >>>>>>> Nico >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Xmldatadumps-l mailing list >>>>>>> Xmldatadumps-l@lists.wikimedia.org >>>>>>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > _______________________________________________ > Xmldatadumps-l mailing list > Xmldatadumps-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l > >
_______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l