Your import process is definitely broken. page_title should be just 'Date', while page_namespace has the numeric key for template pages.
-- brion On Mon, Sep 21, 2015 at 12:02 PM, v0id null <[email protected]> wrote: > For example, the above mentioned missing template does seem to exist from > what I can tell: > > mysql> select page_title from page where page_title='Template:Date'; > +---------------+ > | page_title | > +---------------+ > | Template:Date | > +---------------+ > 1 row in set (0.02 sec) > > On Mon, Sep 21, 2015 at 3:00 PM, v0id null <[email protected]> wrote: > > > > > > http://dumps.wikimedia.org/enwikinews/latest/enwikinews-latest-pages-articles.xml.bz2 > > > > > > this one. I believe this was to contain all latest revisions of all > pages. > > I do see that there are template pages in there, at least, they are pages > > with a title in the format of Template:[some template name] > > > > On Mon, Sep 21, 2015 at 2:53 PM, John <[email protected]> wrote: > > > >> What kind of dump are you working from? > >> > >> > >> On Mon, Sep 21, 2015 at 2:50 PM, v0id null <[email protected]> wrote: > >> > >> > Hello Everyone, > >> > > >> > I've been trying to write a python script that will take an XML dump, > >> and > >> > generate all HTML, using Mediawiki itself to handle all the > >> > parsing/processing, but I've run into a problem where all the parsed > >> output > >> > have warnings that templates couldn't be found. I'm not sure what I'm > >> doing > >> > wrong. > >> > > >> > So I'll explain my steps: > >> > > >> > First I execute the SQL script maintenance/table.sql > >> > > >> > Then I remove some indexes from the tables to speed up insertion. > >> > > >> > Finally I go through the XML which will execute the following insert > >> > statements: > >> > > >> > 'insert into page > >> > (page_id, page_namespace, page_title, page_is_redirect, > page_is_new, > >> > page_random, > >> > page_latest, page_len, page_content_model) values (%s, %s, %s, > %s, > >> %s, > >> > %s, %s, %s, %s)' > >> > > >> > 'insert into text (old_id, old_text) values (%s, %s)' > >> > > >> > 'insert into recentchanges (rc_id, rc_timestamp, rc_user, > rc_user_text, > >> > rc_title, rc_minor, rc_bot, rc_cur_id, rc_this_oldid, > rc_last_oldid, > >> > rc_type, rc_source, rc_patrolled, rc_ip, rc_old_len, rc_new_len, > >> > rc_deleted, > >> > rc_logid) > >> > values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, > >> %s, > >> > %s, %s)' > >> > > >> > 'insert into revision > >> > (rev_id, rev_page, rev_text_id, rev_user, rev_user_text, > >> rev_timestamp, > >> > rev_minor_edit, rev_deleted, rev_len, rev_parent_id, rev_sha1) > >> > values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)' > >> > > >> > All IDs from the XML dump are kept. I noticed that the titles are not > >> web > >> > friendly. Thinking this was the problem I ran the > >> > maintenance/cleanupTitles.php script but it didn't seem to fix any > >> thing. > >> > > >> > Doing this, I can now run the following PHP script: > >> > $id = 'some revision id' > >> > $rev = Revision::newFromId( $id ); > >> > $titleObj = $rev->getTitle(); > >> > $pageObj = WikiPage::factory( $titleObj ); > >> > > >> > $context = RequestContext::newExtraneousContext($titleObj); > >> > > >> > $popts = ParserOptions::newFromContext($context); > >> > $pout = $pageObj->getParserOutput($popts); > >> > > >> > var_dump($pout); > >> > > >> > The mText property of $pout contains the parsed output, but it is full > >> of > >> > stuff like this: > >> > > >> > <a href="/index.php?title=Template:Date&action=edit&redlink=1" > >> class="new" > >> > title="Template:Date (page does not exist)">Template:Date</a> > >> > > >> > > >> > I feel like I'm missing a step here. I tried importing the > templatelinks > >> > SQL dump, but it also did not fix anything. It also did not include > any > >> > header or footer which would be useful. > >> > > >> > Any insight or help is much appreciated, thank you. > >> > > >> > --alex > >> > _______________________________________________ > >> > Wikitech-l mailing list > >> > [email protected] > >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > >> _______________________________________________ > >> Wikitech-l mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > > > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
