[Bug 62209] feature request: Text extraction from custom wiki markup
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209 Max Semenik changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #5 from Max Semenik --- I don't think that turning TE into yet another wikitext parsing facility is the way we want it to evolve. You can do it trivially for your infrastructure though, using ExtractFormatter class. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 62209] feature request: Text extraction from custom wiki markup
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209 --- Comment #4 from Dimitris Kontokostas --- Thanks again, still, is it possible to add these two parameters? This setting works for us but it would suit us better if we had the text/title option. This way we only have to load the templates in the database and feed the text in the api. Otherwise we need to load the whole dump. If you agree to this request, we can work on this addition. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 62209] feature request: Text extraction from custom wiki markup
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209 --- Comment #3 from Max Semenik --- (In reply to Dimitris Kontokostas from comment #2) > does this extension loads the whole page, convert it to html and then return > the first section? Once again, > 1) If you specify &exintro only intro will be parsed. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 62209] feature request: Text extraction from custom wiki markup
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209 --- Comment #2 from Dimitris Kontokostas --- Thanks, I already saw the &exintro option so, one question to understand this. when I use this call: http://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro=&explaintext=&titles=Athens does this extension loads the whole page, convert it to html and then return the first section? if not this extension is perfect for our purpose and don't read the rest :) if yes, we would like to avoid loading the whole page as it would slow down our extraction. What we do so far is to take the wiki markup of the page up to the first section and feed it in the mw "parse" api call [1] which normally returns html. Then we hack into the mw core to return cleaned text. So, the request is to add the "text" and "title" parameters in your api. When they are given, instead of parsing the page by title you will parse the "text" parameter ("title" is used for magic words like {{PAGENAME}}), get the html and clean it the same way you do now. Cheers, Dimitris [1] https://www.mediawiki.org/wiki/API:Parsing_wikitext#parse -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 62209] feature request: Text extraction from custom wiki markup
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209 --- Comment #1 from Max Semenik --- 1) If you specify &exintro only intro will be parsed. 2) TE operates only over HTML returned by parser, doing anything with wikitext directly would be essentially a different extension. What do you mean by "custom wiki markup"? -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l