This discussion brings to mind several historical threads. I wonder if a project to simply mine the whole article contents and provide a DB of some sort with the articles and infobox contents would be worthwhile. Develop a specific parser and generate and publish the complete set of article-infobox-(key-value) sets...
On Thu, Oct 22, 2009 at 11:13 PM, Andrew Dunbar <[email protected]> wrote: > 2009/10/22 Daniel Schwen <[email protected]>: >>> particular, SQL queries on the templatelinks table are intractably >>> slow. Why are there no keys on tl_from or tl_title? >> >> How are you planning to get the template parameters? Have I missed a >> recent schema change? > > I've been trying to parse the wikitext of section 0 with a minimal > parser that uses just the tokens {{ }} {{{ and }}} but it already has > probems when it sees }}}} > >> I'd be interested in following your progress. I'm not extracting >> infobox data, but parameters of the coordinate template. Maybe a >> similar approach could be interesting for you: >> >> The coordinate template stuffs all its parameters int an external >> link (which can easily be obtained from the externallinks table). >> Creating dummy links containing parameters for some infoboxes could be >> one way of making the data available for automatic extraction (yes, >> it's a hack, but I'd prefer better suggestions over flames). >> >> The link could actually be made useful, it could point to a query page >> for the data in these infoboxes. > > The template and parameters I'm interested don't generate any such > external links and probably couldn't very easily... > > But I have just discovered the rvgeneratexml parameter to > action=query&prop=revisions > This includes a <part> field for each template parameter with a <name> > and a <value> for each... > > Andrew Dunbar (hippietrail) > >> [[User:Dschwen]] >> >> _______________________________________________ >> Wikitech-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > > > > -- > http://wiktionarydev.leuksman.com http://linguaphile.sf.net > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- -george william herbert [email protected] _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
