https://bugzilla.wikimedia.org/show_bug.cgi?id=35083
Web browser: ---
Bug #: 35083
Summary: OpenSearchXml first sentences extraction produces bad
results
Product: MediaWiki extensions
Version: any
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: OpenSearchXml
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
Mobile Platform: ---
The current regex, roughly "end capture after the second dot followed by
whitespace produces wildly inaccurate results for sentences with dots in the
middle, for example if article title contains dots:
https://en.wikipedia.org/w/api.php?action=opensearch&format=xmlfm&search=.s.p.%20v&limit=10
<Item>
<Text xml:space="preserve">S. P. Venkatesh</Text>
<Description xml:space="preserve">S. P. </Description>
<Url
xml:space="preserve">https://en.wikipedia.org/wiki/S._P._Venkatesh</Url>
</Item>
<Item>
<Text xml:space="preserve">S. P. Velumani</Text>
<Description xml:space="preserve">S. P. </Description>
<Url
xml:space="preserve">https://en.wikipedia.org/wiki/S._P._Velumani</Url>
</Item>
It should be something like "first dot followed by whitespace after a certain
number of characters".
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l