Hello Everyone, I've been trying to write a python script that will take an XML dump, and generate all HTML, using Mediawiki itself to handle all the parsing/processing, but I've run into a problem where all the parsed output have warnings that templates couldn't be found. I'm not sure what I'm doing wrong.
So I'll explain my steps: First I execute the SQL script maintenance/table.sql Then I remove some indexes from the tables to speed up insertion. Finally I go through the XML which will execute the following insert statements: 'insert into page (page_id, page_namespace, page_title, page_is_redirect, page_is_new, page_random, page_latest, page_len, page_content_model) values (%s, %s, %s, %s, %s, %s, %s, %s, %s)' 'insert into text (old_id, old_text) values (%s, %s)' 'insert into recentchanges (rc_id, rc_timestamp, rc_user, rc_user_text, rc_title, rc_minor, rc_bot, rc_cur_id, rc_this_oldid, rc_last_oldid, rc_type, rc_source, rc_patrolled, rc_ip, rc_old_len, rc_new_len, rc_deleted, rc_logid) values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)' 'insert into revision (rev_id, rev_page, rev_text_id, rev_user, rev_user_text, rev_timestamp, rev_minor_edit, rev_deleted, rev_len, rev_parent_id, rev_sha1) values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)' All IDs from the XML dump are kept. I noticed that the titles are not web friendly. Thinking this was the problem I ran the maintenance/cleanupTitles.php script but it didn't seem to fix any thing. Doing this, I can now run the following PHP script: $id = 'some revision id' $rev = Revision::newFromId( $id ); $titleObj = $rev->getTitle(); $pageObj = WikiPage::factory( $titleObj ); $context = RequestContext::newExtraneousContext($titleObj); $popts = ParserOptions::newFromContext($context); $pout = $pageObj->getParserOutput($popts); var_dump($pout); The mText property of $pout contains the parsed output, but it is full of stuff like this: <a href="/index.php?title=Template:Date&action=edit&redlink=1" class="new" title="Template:Date (page does not exist)">Template:Date</a> I feel like I'm missing a step here. I tried importing the templatelinks SQL dump, but it also did not fix anything. It also did not include any header or footer which would be useful. Any insight or help is much appreciated, thank you. --alex _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l