Thanks Santhosh for the excellent resource (http://unicode.org/reports/tr31/).

> But for Tamil, I am  not aware of any valid pattern where ZWJ or ZWNJ is
> valid. 

It's valid only to force the decomposition of ksha (க்ஷ) into k- (க்) followed 
by sha (ஷ).
It's an extremely rare and only an historically relevant grantha character and 
we can certainly live without this decomposition in urls.
In fact, a) people have disputed the inclusion of ksha and sha under the Tamil 
chart in the first place and b) people have argued that, when used, the default 
behaviour should be the decomposed form, and a joiner be used to force 
concatenation.

After seeing the linked resource, we can safely ask for dropping of these 
characters in titles.

- Sundar


 "That language is an instrument of human reason, and not merely a medium for 
the expression of thought, is a truth generally admitted."
- George Boole, quoted in Iverson's Turing Award Lecture



----- Original Message ----
> From: "santhosh.thottin...@gmail.com" <santhosh.thottin...@gmail.com>
> To: Discussion list on Indian language projects of Wikimedia. 
><wikimediaindia-l@lists.wikimedia.org>
> Sent: Wed, December 29, 2010 3:23:37 PM
> Subject: Re: [Wikimediaindia-l] Indic languages & unicode issues.
> 
> > Update: A Tamil Wikipedian, Mahir, went to the core of the issue that  I
> > cited in
> > my previous email and identified that the issue in  that instance was due
> > to the
> > superfluous use of the zero width  non-joiner HTML entity. We're going to
> > file a
> > bug asking  Mediwiki to chomp those entities when they occur in
> >  inappropriate
> > places.
> 
> 
> Qn: Definition for "inappropriate  places"?
> Ans: Wikipedia URLs should be considered as "identifiers" and should  use
> Unicode standard for Identifier definition using unicode data.
> Unicode  Standard Annex #31 defines this clearly.
> http://unicode.org/reports/tr31/  IMHO, Mediawiki should implement this
> standard.
> 
> But for Tamil, I am  not aware of any valid pattern where ZWJ or ZWNJ is
> valid. I am aware of  valid patterns for other Indian languages. So in that
> case we should remove  all zwj,zwnj from Tamil urls.
> Sometime back, the inbuilt tool in Malayalam  wiki used to allow putting n
> number of zwj in text and we corrected the  script to disallow user to put
> more than one zwj, zwnj in sequence(this is  what UAX #31 says  too).
> 
> Thanks
> Santhosh
> 
> 
> 
> _______________________________________________
> Wikimediaindia-l  mailing list
> Wikimediaindia-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
> 

_______________________________________________
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Reply via email to