[Bug 62209] feature request: Text extraction from custom wiki markup

2014-03-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209

--- Comment #4 from Dimitris Kontokostas jimk...@gmail.com ---
Thanks again,

still, is it possible to add these two parameters? 
This setting works for us but it would suit us better if we had the text/title
option.

This way we only have to load the templates in the database and feed the text
in the api. Otherwise we need to load the whole dump.

If you agree to this request, we can work on this addition.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 62209] feature request: Text extraction from custom wiki markup

2014-03-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209

Max Semenik maxsem.w...@gmail.com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #5 from Max Semenik maxsem.w...@gmail.com ---
I don't think that turning TE into yet another wikitext parsing facility is the
way we want it to evolve. You can do it trivially for your infrastructure
though, using ExtractFormatter class.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 62209] feature request: Text extraction from custom wiki markup

2014-03-05 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209

--- Comment #3 from Max Semenik maxsem.w...@gmail.com ---
(In reply to Dimitris Kontokostas from comment #2)
 does this extension loads the whole page, convert it to html and then return
 the first section? 

Once again, 
 1) If you specify exintro only intro will be parsed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 62209] feature request: Text extraction from custom wiki markup

2014-03-04 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209

--- Comment #1 from Max Semenik maxsem.w...@gmail.com ---
1) If you specify exintro only intro will be parsed.
2) TE operates only over HTML returned by parser, doing anything with wikitext
directly would be essentially a different extension. What do you mean by
custom wiki markup?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 62209] feature request: Text extraction from custom wiki markup

2014-03-04 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62209

--- Comment #2 from Dimitris Kontokostas jimk...@gmail.com ---
Thanks,

I already saw the exintro option so, one question to understand this. 

when I use this call:
http://en.wikipedia.org/w/api.php?action=queryprop=extractsexintro=explaintext=titles=Athens

does this extension loads the whole page, convert it to html and then return
the first section? 
if not this extension is perfect for our purpose and don't read the rest :)

if yes, we would like to avoid loading the whole page as it would slow down our
extraction.

What we do so far is to take the wiki markup of the page up to the first
section and feed it in the mw parse api call [1] which normally returns html.
Then we hack into the mw core to return cleaned text. 

So, the request is to add the text and title parameters in your api. When
they are given, instead of parsing the page by title you will parse the text
parameter (title is used for  magic words like {{PAGENAME}}), get the html
and clean it the same way you do now.

Cheers,
Dimitris

[1] https://www.mediawiki.org/wiki/API:Parsing_wikitext#parse

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l