Your import process is definitely broken. page_title should be just 'Date',
while page_namespace has the numeric key for template pages.

-- brion

On Mon, Sep 21, 2015 at 12:02 PM, v0id null <[email protected]> wrote:

>  For example, the above mentioned missing template does seem to exist from
> what I can tell:
>
> mysql> select page_title from page where page_title='Template:Date';
> +---------------+
> | page_title    |
> +---------------+
> | Template:Date |
> +---------------+
> 1 row in set (0.02 sec)
>
> On Mon, Sep 21, 2015 at 3:00 PM, v0id null <[email protected]> wrote:
>
> >
> >
> http://dumps.wikimedia.org/enwikinews/latest/enwikinews-latest-pages-articles.xml.bz2
> >
> >
> > this one. I believe this was to contain all latest revisions of all
> pages.
> > I do see that there are template pages in there, at least, they are pages
> > with a title in the format of Template:[some template name]
> >
> > On Mon, Sep 21, 2015 at 2:53 PM, John <[email protected]> wrote:
> >
> >> What kind of dump are you working from?
> >>
> >>
> >> On Mon, Sep 21, 2015 at 2:50 PM, v0id null <[email protected]> wrote:
> >>
> >> > Hello Everyone,
> >> >
> >> > I've been trying to write a python script that will take an XML dump,
> >> and
> >> > generate all HTML, using Mediawiki itself to handle all the
> >> > parsing/processing, but I've run into a problem where all the parsed
> >> output
> >> > have warnings that templates couldn't be found. I'm not sure what I'm
> >> doing
> >> > wrong.
> >> >
> >> > So I'll explain my steps:
> >> >
> >> > First I execute the SQL script maintenance/table.sql
> >> >
> >> > Then I remove some indexes from the tables to speed up insertion.
> >> >
> >> > Finally I go through the XML which will execute the following insert
> >> > statements:
> >> >
> >> >  'insert into page
> >> >     (page_id, page_namespace, page_title, page_is_redirect,
> page_is_new,
> >> > page_random,
> >> >      page_latest, page_len, page_content_model) values (%s, %s, %s,
> %s,
> >> %s,
> >> > %s, %s, %s, %s)'
> >> >
> >> > 'insert into text (old_id, old_text) values (%s, %s)'
> >> >
> >> > 'insert into recentchanges (rc_id, rc_timestamp, rc_user,
> rc_user_text,
> >> >    rc_title, rc_minor, rc_bot, rc_cur_id, rc_this_oldid,
> rc_last_oldid,
> >> >    rc_type, rc_source, rc_patrolled, rc_ip, rc_old_len, rc_new_len,
> >> > rc_deleted,
> >> >    rc_logid)
> >> >    values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,
> >> %s,
> >> > %s, %s)'
> >> >
> >> > 'insert into revision
> >> >     (rev_id, rev_page, rev_text_id, rev_user, rev_user_text,
> >> rev_timestamp,
> >> >      rev_minor_edit, rev_deleted, rev_len, rev_parent_id, rev_sha1)
> >> >       values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'
> >> >
> >> > All IDs from the XML dump are kept. I noticed that the titles are not
> >> web
> >> > friendly. Thinking this was the problem I ran the
> >> > maintenance/cleanupTitles.php script but it didn't seem to fix any
> >> thing.
> >> >
> >> > Doing this, I can now run the following PHP script:
> >> >     $id = 'some revision id'
> >> >     $rev = Revision::newFromId( $id );
> >> >     $titleObj = $rev->getTitle();
> >> >     $pageObj = WikiPage::factory( $titleObj );
> >> >
> >> >     $context = RequestContext::newExtraneousContext($titleObj);
> >> >
> >> >     $popts = ParserOptions::newFromContext($context);
> >> >     $pout = $pageObj->getParserOutput($popts);
> >> >
> >> >     var_dump($pout);
> >> >
> >> > The mText property of $pout contains the parsed output, but it is full
> >> of
> >> > stuff like this:
> >> >
> >> > <a href="/index.php?title=Template:Date&action=edit&redlink=1"
> >> class="new"
> >> > title="Template:Date (page does not exist)">Template:Date</a>
> >> >
> >> >
> >> > I feel like I'm missing a step here. I tried importing the
> templatelinks
> >> > SQL dump, but it also did not fix anything. It also did not include
> any
> >> > header or footer which would be useful.
> >> >
> >> > Any insight or help is much appreciated, thank you.
> >> >
> >> > --alex
> >> > _______________________________________________
> >> > Wikitech-l mailing list
> >> > [email protected]
> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >> _______________________________________________
> >> Wikitech-l mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> >
> >
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to