Dear Anivar:

> There are Four Components

Thanks for the addendum - how important is the rendering engine in the
scheme of things? Is work on that pretty much done or are there issues
there too?

> It is Font dependent. There is a need of Preparing Conversion maps for
> each Ascii font to convert data encoded in them to unicode.
> Swathanthra Malayalam Computing's Payyan's
> ( ) is a tool developed for converting
> ASCII to Unicode easily  for any Indic Language by building a Font map
> for each needed font . This tool helped Malayalam Wiktionary to
> convert many copyright expired books in non standard encodings to
> Unicode
> Popular Firefox extension named Padma uses similar encoding conversion
> tables to display ASCII news websites in Unicode

So how do these work? They have built a map for every single ASCII
encoding/font pair (since this is some ugly hack) and the
corresponding Unicode value? There must be thousands of ASCII
encoding/font pairs right? Is this even a viable option? Are there
alternatives to this?

> I dont think this will happen. There is a long history of lobbying for
> thiswith CDAC from 2001 Onwards and nothing happened. CDAC made enough
> money by selling ASCII fonts(and still makes) and They cant even think
> about giving them away with a FOSS License . And during frequent terms
>  they eat more government money for making yet another CD to ship with
> their FOSS project forks (such ad Bhaathiya OO , IndiFox etc )+ These
> fonts. In the same way most of the TDIL funding to CDAC for Indic
> Language technology research does not make output at all or not
> getting released, even after TDIL's policy decision to release them
> under a foss license.

I can see the frustration of this - so in your opinion, an effort not
worth undertaking? Assuming they were ready to use a FOSS license, are
the fonts good enough to want to use?

> Searching and sorting algorithms for Indic languages are in
> development and are not bug free. Indic support is not yet available
> in most of the search solutions (including FOSS solutions like Lucene
> or Solr) because of the complex word formation characteristics.

But if I understand correctly, this is *only* possible using Unicode
encoding. Right?

Thank you, Anivar.



Wikimediaindia-l mailing list

Reply via email to