https://bugzilla.wikimedia.org/show_bug.cgi?id=35083

       Web browser: ---
             Bug #: 35083
           Summary: OpenSearchXml first sentences extraction produces bad
                    results
           Product: MediaWiki extensions
           Version: any
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: OpenSearchXml
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified
   Mobile Platform: ---


The current regex, roughly "end capture after the second dot followed by
whitespace produces wildly inaccurate results for sentences with dots in the
middle, for example if article title contains dots:

https://en.wikipedia.org/w/api.php?action=opensearch&format=xmlfm&search=.s.p.%20v&limit=10

    <Item>
      <Text xml:space="preserve">S. P. Venkatesh</Text>
      <Description xml:space="preserve">S. P. </Description>
      <Url
xml:space="preserve">https://en.wikipedia.org/wiki/S._P._Venkatesh</Url>
    </Item>
    <Item>
      <Text xml:space="preserve">S. P. Velumani</Text>
      <Description xml:space="preserve">S. P. </Description>
      <Url
xml:space="preserve">https://en.wikipedia.org/wiki/S._P._Velumani</Url>
    </Item>

It should be something like "first dot followed by whitespace after a certain
number of characters".

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to