This discussion brings to mind several historical threads.

I wonder if a project to simply mine the whole article contents and
provide a DB of some sort with the articles and infobox contents would
be worthwhile.  Develop a specific parser and generate and publish the
complete set of article-infobox-(key-value) sets...


On Thu, Oct 22, 2009 at 11:13 PM, Andrew Dunbar <[email protected]> wrote:
> 2009/10/22 Daniel Schwen <[email protected]>:
>>> particular, SQL queries on the templatelinks table are intractably
>>> slow. Why are there no keys on tl_from or tl_title?
>>
>> How are you planning to get the template parameters? Have I missed a
>> recent schema change?
>
> I've been trying to parse the wikitext of section 0 with a minimal
> parser that uses just the tokens {{ }} {{{ and }}} but it already has
> probems when it sees }}}}
>
>> I'd be interested in following your progress. I'm not extracting
>> infobox data, but parameters of the coordinate template. Maybe a
>> similar approach could be interesting for you:
>>
>>  The coordinate template stuffs all its parameters int an external
>> link (which can easily be obtained from the externallinks table).
>> Creating dummy links containing parameters for some infoboxes could be
>> one way of making the data available for automatic extraction (yes,
>> it's a hack, but I'd prefer better suggestions over flames).
>>
>> The link could actually be made useful, it could point to a query page
>> for the data in these infoboxes.
>
> The template and parameters I'm interested don't generate any such
> external links and probably couldn't very easily...
>
> But I have just discovered the rvgeneratexml parameter to
> action=query&prop=revisions
> This includes a <part> field for each template parameter with a <name>
> and a <value> for each...
>
> Andrew Dunbar (hippietrail)
>
>> [[User:Dschwen]]
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
>
> --
> http://wiktionarydev.leuksman.com http://linguaphile.sf.net
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
-george william herbert
[email protected]

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to