Mahir256 added a project: MediaWiki-extensions-WikibaseRepository.
Mahir256 updated the task description. (Show Details)

CHANGES TO TASK DESCRIPTION
Most Indic-language sites and Commons, which use opensearch for search queries, appear to process characters such as ढ़, য়, ਖ਼, and ଡ଼—note that these are combined, i.e. //not// already decomposed into a consonant and a nukta—appropriately when they are present in search queries, returning appropriate suggestions. (The bolding of the text within the search suggestions corresponding to what was typed does not appear, but that's not quite as troublesome of a matter.)

Wikidata's search functionality, wbsearchentities, returns the proper JSON response given a search containing the aforementioned characters, but the results are not rendered properly at all. In particular, the warning "//The value passed for "search" contains invalid or non-normalized data. Textual data should be valid, NFC-normalized Unicode without C0 control characters other than HT (\t), LF (\n), and CR (\r).//" is attached to the results. This causes either 1) the waiting icon to remain indefinitely, if it is a new search query, or 2) the previous results to remain, if it is a modification to another search query.
...
It appears that all of the characters I have mentioned are part of [[ http://www.unicode.org/Public/10.0.0/ucd/CompositionExclusions.txt | this list of characters excluded from composition per the Unicode Standard ]], and that they [[http://www.unicode.org/Public/10.0.0/ucd/DerivedNormalizationProps.txt | cannot ever occur in their respective normalization forms]]—if that information is in any way helpful.

If wbsearchentities does not already normalize the Unicode data passed to it in the "search" parameter, then it becomes really problematic since input methods for languages such as Bengali and Hindi are not always guaranteed to output letters containing nukta as separate characters.

TASK DETAIL
https://phabricator.wikimedia.org/T170779

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mahir256
Cc: PokestarFan, daniel, thiemowmde, Aftabuzzaman, Mahir256, Aklapper, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to