[Bug 62209] feature request: Text extraction from custom wiki markup

bugzilla-daemon Tue, 04 Mar 2014 13:26:13 -0800

https://bugzilla.wikimedia.org/show_bug.cgi?id=62209


--- Comment #2 from Dimitris Kontokostas <jimk...@gmail.com> ---
Thanks,

I already saw the &exintro option so, one question to understand this. 

when I use this call:
http://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro=&explaintext=&titles=Athens

does this extension loads the whole page, convert it to html and then return
the first section? 
if not this extension is perfect for our purpose and don't read the rest :)

if yes, we would like to avoid loading the whole page as it would slow down our
extraction.

What we do so far is to take the wiki markup of the page up to the first
section and feed it in the mw "parse" api call [1] which normally returns html.
Then we hack into the mw core to return cleaned text. 

So, the request is to add the "text" and "title" parameters in your api. When
they are given, instead of parsing the page by title you will parse the "text"
parameter ("title" is used for  magic words like {{PAGENAME}}), get the html
and clean it the same way you do now.

Cheers,
Dimitris

[1] https://www.mediawiki.org/wiki/API:Parsing_wikitext#parse

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 62209] feature request: Text extraction from custom wiki markup

Reply via email to