> Update: A Tamil Wikipedian, Mahir, went to the core of the issue that I
> cited in
> my previous email and identified that the issue in that instance was due
> to the
> superfluous use of the zero width non-joiner HTML entity. We're going to
> file a
> bug asking Mediwiki to chomp those entities when they occur in
> inappropriate
> places.


Qn: Definition for "inappropriate places"?
Ans: Wikipedia URLs should be considered as "identifiers" and should use
Unicode standard for Identifier definition using unicode data.
Unicode Standard Annex #31 defines this clearly.
http://unicode.org/reports/tr31/ IMHO, Mediawiki should implement this
standard.

But for Tamil, I am not aware of any valid pattern where ZWJ or ZWNJ is
valid. I am aware of valid patterns for other Indian languages. So in that
case we should remove all zwj,zwnj from Tamil urls.
Sometime back, the inbuilt tool in Malayalam wiki used to allow putting n
number of zwj in text and we corrected the script to disallow user to put
more than one zwj, zwnj in sequence(this is what UAX #31 says too).

Thanks
Santhosh



_______________________________________________
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Reply via email to