---------- Forwarded message ----------
From: Ariel Glenn WMF <ar...@wikimedia.org>
Date: Fri, Apr 22, 2016 at 9:21 PM
Subject: Re: [Xmldatadumps-l] Failed dumps
To: InfoSports <a...@infosports.com>
Cc: gnosygnu <gnosy...@gmail.com>, Ariel Glenn WMF <ar...@wikimedia.org>


I've been out ill this week.  First day back today.  I'm tracking this
issue here, including any reruns: https://phabricator.wikimedia.org/T133416

Ariel

On Fri, Apr 15, 2016 at 6:26 PM, InfoSports <a...@infosports.com> wrote:

> I noticed the decreased size as well.
>
> Also, there are many duplicates in the download. Example article titles…
>
> Rainbow, California
> Murrieta, California
> Fallbrook, California
> Temecula, California
> Wildomar, California
> Sedeco Hills, California
> Palomar Mountain
> ...and many more
>
>
> Please re-run the process. There are too many errors in this one to be
> usable.
>
> Thank you in advance.
>
> -Al
>
>
> > On Apr 14, 2016, at 8:56 PM, gnosygnu <gnosy...@gmail.com> wrote:
> >
> > Hi. I think there may still be problems with the 2016-04-07 English
> Wikipedia dump. It's missing many articles in the Module namespace.
> >
> > Here are some details:
> > * I downloaded
> https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-pages-articles.xml.bz2
> . I got an XML file that was 10.8 GB (i.e.: it does not look severely
> truncated)
> > * I ran the following grep commands. Note that Module:Hatnote is blank.
> I ran the last grep to show that the criteria should be correct.
> > root~> grep "<title>Earth</title>" /home/root/xowa/wiki/
> en.wikipedia.org/enwiki-latest-pages-articles.xml
> >     <title>Earth</title>
> > root~> grep "<title>Template:About</title>" /home/root/xowa/wiki/
> en.wikipedia.org/enwiki-latest-pages-articles.xml
> >     <title>Template:About</title>
> > root~> grep "<title>Module:Hatnote</title>" /home/root/xowa/wiki/
> en.wikipedia.org/enwiki-latest-pages-articles.xml
> > root~> grep "<title>Module:" /home/root/xowa/wiki/
> en.wikipedia.org/enwiki-latest-pages-articles.xml
> >     <title>Module:Location map/data/Croatia/doc</title>
> >     <title>Module:Location map/data/USA Alabama/doc</title>
> >     ...
> > * The following Modules appear to be missing in the 2016-04-07 dump
> > Module:Use_mdy_dates
> > Module:Pp-move-indef
> > Module:Protection_banner
> > Module:Unsubst
> > * By my count, there were 2,970 articles in the Module namespace in the
> 2016-03-05 dump. In contrast, there are only 652 in the 2016-04-07 dump.
> >
> > Let me know if you need any other information. I believe that the above
> can be verified by anyone else, but I'd be happy to provide more detail
> >
> > Thanks.
> >
> >
> >
> >
> > On Thu, Apr 14, 2016 at 8:49 AM, Ariel Glenn WMF <ar...@wikimedia.org>
> wrote:
> > It hasn't failed.  It's still running but the jobs that previously
> failed have been left in that status until they get rerun.  That's standard
> behavior.  Don't worry, be happy! :-)
> >
> > Ariel
> >
> > On Thu, Apr 14, 2016 at 2:15 PM, Nicolas Vervelle <nverve...@gmail.com>
> wrote:
> > But at least, pages-articles worked, so it's ok for me.
> >
> > On Thu, Apr 14, 2016 at 1:13 PM, Nicolas Vervelle <nverve...@gmail.com>
> wrote:
> > Well, enwiki failed again today...
> >
> > On Wed, Apr 13, 2016 at 4:37 PM, Ariel Glenn WMF <ar...@wikimedia.org>
> wrote:
> > You are right. Two jobs were competing for enwiki since I allocated one
> more lousy core to the host that runs them. I've fixed the config to avoid
> that. It will resume in a few hours with cron.
> >
> > Ariel
> >
> > On Wed, Apr 13, 2016 at 4:37 PM, Nicolas Vervelle <nverve...@gmail.com>
> wrote:
> > Thanks Ariel,
> >
> > It seems to have worked for some dumps (frwiki for example), but other
> dumps are still failing (enwiki for example)
> >
> > Nico
> >
> > On Tue, Apr 12, 2016 at 11:04 AM, Ariel Glenn WMF <ar...@wikimedia.org>
> wrote:
> > Hi Nicolas,
> >
> > These will be picked up on reruns, which will happen over the next day
> or so.  The failure was caused by an obscure hhvm bug which only triggers
> under certain circumstances.  For more information about that, see:
> https://phabricator.wikimedia.org/T94277
> >
> > This morning I did jobs cleanup, switched the dump jobs to use php5
> again and the dumps have restarted.
> >
> > Ariel
> >
> > On Tue, Apr 12, 2016 at 11:25 AM, Nicolas Vervelle <nverve...@gmail.com>
> wrote:
> > Hi,
> >
> > Is anyone working on the failed dumps for April ? (enwiki, frwiki,
> ruwiki, itwiki, ...)
> >
> > Nico
> >
> > _______________________________________________
> > Xmldatadumps-l mailing list
> > Xmldatadumps-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Xmldatadumps-l mailing list
> > Xmldatadumps-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
> >
> >
> > _______________________________________________
> > Xmldatadumps-l mailing list
> > Xmldatadumps-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to