Re: [Wikitech-l] Static HTML Dumps

Paul Houle Tue, 05 Apr 2011 13:41:48 -0700

  On 4/5/2011 4:00 PM, Platonides wrote
> I think he is better parsing the articles, though.
>
> For a linguistic research you don't need things such as the contents of
> templates, so a simple wikitext stripping would do. And it will be much,
> much, much, much faster than parsing the whole wiki.
>
     Could be true,  but what's fascinating for me about Wikipedia is 
all of the unscrambled eggs that can be found in the middle of otherwise 
unstructured text.


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Static HTML Dumps

Reply via email to