Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Anivar Aravind
On 2/23/11, Gautam John gau...@prathambooks.org wrote:
 Dear Anivar:

 There are Four Components

 Thanks for the addendum - how important is the rendering engine in the
 scheme of things? Is work on that pretty much done or are there issues
 there too?

If your language have some errors in Complex Glyph formation, it is a
rendering engine issue.
You can find more here
http://en.wikipedia.org/wiki/Wikipedia:Enabling_complex_text_support_for_Indic_scripts

Rendering Engines like Pango evolved through more than 10 years of
patching  correction by language communities. It work Pretty well in
most of the indic languages.
Harfbuzz(http://www.freedesktop.org/wiki/Software/HarfBuzz) is
relatively new player in the field by taking code from Pango QT  ICU
. Harfuzz-ng is used in new Firefox 4 as its default Rendering engine
.Uniscribe engine in  Windows based systems started supporting Indic
fonts  from Windows XP SP2  onwards.

Let me give an example for why Rendering engine is important.
Now For latin script wiki's there is PDF download option  Pediapress
to print them directly
But Such Options are not available for Non Latin wikis
Character Rendering is a the block here. Pedia press's library fails
to render non latin  content , because the library they use is not
making use of rendering engines.

If a teacher went to internet cafe for reading a wikipedia entry in
indian language , she must ensure following things before
reading/printing articles

1. ensure the Operating system have Indic support
2. Ensure It have a font to display content correctly
3. Browser renders well

Then only she can read it/ print it in human readable form. If there
is PDF export facility with server side rendering , it was so easy for
her to to take it /print it for students.

Sometime back Santhosh Posted his project Pypdflib for testing in this
list. It is  a library for rendering PDF from Indic language wiki
pages . It uses functionality of pango for generating PDF
In short Rendering is a major roadblock in reaching wikipedia to
masses. The projects like santhosh's effort  are very important to
fill this gap.



 It is Font dependent. There is a need of Preparing Conversion maps for
 each Ascii font to convert data encoded in them to unicode.
 Swathanthra Malayalam Computing's Payyan's
 (http://wiki.smc.org.in/Payyans ) is a tool developed for converting
 ASCII to Unicode easily  for any Indic Language by building a Font map
 for each needed font . This tool helped Malayalam Wiktionary to
 convert many copyright expired books in non standard encodings to
 Unicode
 Popular Firefox extension named Padma uses similar encoding conversion
 tables to display ASCII news websites in Unicode

 So how do these work? They have built a map for every single ASCII
 encoding/font pair (since this is some ugly hack) and the
 corresponding Unicode value?

Yes. payyan's wikipage have an Howto for creating fontmaps

There must be thousands of ASCII
 encoding/font pairs right? Is this even a viable option? Are there
 alternatives to this?

This is the only viable option as of now. Most of the languages have
around 10-20 popular fonts . Creating Mapping tables for them is
anyway a big task . But if each language communities are contributing,
it is not a big task. And Padma project has done mapping of many news
website fonts already through the contributions of many people.

There is no other free alternative .  BTW Document Conversion is a big
business and many corporates are working on this area to provide
solutions for companies  governments

 I dont think this will happen. There is a long history of lobbying for
 thiswith CDAC from 2001 Onwards and nothing happened. CDAC made enough
 money by selling ASCII fonts(and still makes) and They cant even think
 about giving them away with a FOSS License . And during frequent terms
  they eat more government money for making yet another CD to ship with
 their FOSS project forks (such ad Bhaathiya OO , IndiFox etc )+ These
 fonts. In the same way most of the TDIL funding to CDAC for Indic
 Language technology research does not make output at all or not
 getting released, even after TDIL's policy decision to release them
 under a foss license.

 I can see the frustration of this - so in your opinion, an effort not
 worth undertaking? Assuming they were ready to use a FOSS license, are
 the fonts good enough to want to use?

In my opinion, Efforts on this will be waste of time  money .I dont
believe in miracles with CDAC.

CDACMumbai have a history of GPL Licensing one font series as a part
of their indix project  , Raghu Series, by Late. Prof. R.K.Joshi,
Famous Calligrapher and Researcher in Type faces.
http://en.wikipedia.org/wiki/R_K_Joshi
Rebranding his Jana Series fonts to raghu series  GPLing them  was
his long term effort from inside CDAC. But its font tables need to be
corrected to make them usable .
We did this work for malayalam and Raghu-Malayalam is currently
maintained by SMC.
Anyway it is an 

Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Nikhil Sheth
Great discussion, but I wonder why I didn't see any real, easy, doable,
inexpensive, quickfix solution put forth that every Indian on the internet
can begin using immediately to get around the Unicode Vs custom Fonts issue.

So here's some from me:

1. Quick copy-paste, working with a net connection:
http://www.google.com/transliterate/

2. Put a bookmarklet/favorite in your browser to type in Indian language in
any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html

3. Get these languages installed in 5 mins on your machine so you can use it
in any application from notepad to chat :
http://www.google.com/ime/transliteration/ or sneak out the files for
offline installation in your hometown using this neat hack:
http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.html

(I know our greatest angels won't care about this one because it only works
on Evil Windows!)

4. Indian made alternative both editor and input language:
http://www.baraha.com/

Sincere apologies to the purists who might blow up like a volcano at either
going to the Evil Google Lord for help, or Daring to use transliteration
instead of the
so-easy-to-use-and-learn-if-only-you-spend-a-whole-day-on-it-and-get-an-indic-script-keyboard-from-God-knows-where-because-everyone-is-well-off-and-supposed-to-be-living-in-a-well-connected-metro-like-me.


If there is an open-source/cross-platform/creative commons/kumbayaah
solution where we don't have to mug up what to do when we forget what we are
supposed to have mugged up like the key combination for भ or त्र or ण
or ळinstead of just typing bh or tra or na or l and (if
needed)
backspacing twice to get a dropdown menu to choose what we truly want and
moving on with our lives, or where we don't have to bend the laws of physics
to get that elusive त्सा or perform computer साल्सा to have that split
letter stuff on our screen then let's have it right here and right now or
let's get our hands dirty and make'em for the love of the Lord instead of
blasting the impure and corrupt Harijans who dare to take shortcuts for the
sake of getting their work done on time.

(Disclaimer : Only little offense meant with the hope to give a kick and
create a demand for real open source solutions that can rival the private
ones)


Cheers,
Nikhil Sheth
+91-966-583-1250
Pune, India
Teach For India http://www.teachforindia.org/ Fellow, 2011-13
www.nikhilsheth.tk
Find me on: Twitter http://twitter.com/nikhiljs |
Facebookhttp://www.facebook.com/nikjs|
LinkedIn http://in.linkedin.com/in/nikhiljs | Google
http://www.google.com/profiles/nikhil.js|
RangDehttp://www.rangde.org/investor/nikhilsheth
Join me on: Pune Documentary
Clubhttp://www.facebook.com/group.php?gid=138497769525636| Let's
Do it Pune http://www.facebook.com/pages/Lets-do-it-Pune/103857326346659 |
Toastmasters in
Punehttp://www.facebook.com/pages/Toastmasters-in-Pune/148767611833746|
Wikipedia
For Schools 
projecthttp://education.wikia.com/wiki/Wikipedia_For_Schools_Offline_Edition
___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread sankarshan
On Thu, Feb 24, 2011 at 9:15 AM, Nikhil Sheth nikhil...@gmail.com wrote:
 Great discussion, but I wonder why I didn't see any real, easy, doable,
 inexpensive, quickfix solution put forth that every Indian on the internet
 can begin using immediately to get around the Unicode Vs custom Fonts issue.

 So here's some from me:

 1. Quick copy-paste, working with a net connection:
 http://www.google.com/transliterate/

 2. Put a bookmarklet/favorite in your browser to type in Indian language in
 any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html

 3. Get these languages installed in 5 mins on your machine so you can use it
 in any application from notepad to chat :
 http://www.google.com/ime/transliteration/ or sneak out the files for
 offline installation in your hometown using this neat hack:
 http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.html

 (I know our greatest angels won't care about this one because it only works
 on Evil Windows!)

 4. Indian made alternative both editor and input language:
 http://www.baraha.com/

Getting things fixed at the 'plumbing' level is a hard climb but it is
worth it since it would also ensure that offline devices can utilize
what is technically correct (note, that this does not necessarily
imply that the above choices are 'incorrect'). Doing it using web
technologies is one thing, doing it for the desktop, especially the
offline-desktop is another part of the same coin.

We have come a long way since the days when one needed a recompiled
Pango (the renderer) to even decently render Indic or, when input
methods were flaky. Using standards and developing code pieces that
comply with those standards make it easier for platforms across the
spectrum to do Indic (and, other complex scripts) well.

And, looking at all this discussion I now wish that I submitted a
'state of Indic' paper at some conference happening currently ;)

-- 
sankarshan mukhopadhyay
http://sankarshan.randomink.org/blog

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Shiju Alex

 This discussion is not at all about input methods. I do not know why a
sudden comparison between transliteration vs. InScript came here.

Looking at all the solutions you provided, let me ask one thing. Have you
really actively contributed/contributing to any Indian language wikipedia. A
survey on the input methods used by Indian wikipedians will give a different
answer.



Shiju



On Thu, Feb 24, 2011 at 9:15 AM, Nikhil Sheth nikhil...@gmail.com wrote:

 Great discussion, but I wonder why I didn't see any real, easy, doable,
 inexpensive, quickfix solution put forth that every Indian on the internet
 can begin using immediately to get around the Unicode Vs custom Fonts issue.

 So here's some from me:

 1. Quick copy-paste, working with a net connection:
 http://www.google.com/transliterate/

 2. Put a bookmarklet/favorite in your browser to type in Indian language in
 any site. इधर भी :
 http://t13n.googlecode.com/svn/trunk/blet/docs/help.html

 3. Get these languages installed in 5 mins on your machine so you can use
 it in any application from notepad to chat :
 http://www.google.com/ime/transliteration/ or sneak out the files for
 offline installation in your hometown using this neat hack:
 http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.html

 (I know our greatest angels won't care about this one because it only works
 on Evil Windows!)

 4. Indian made alternative both editor and input language:
 http://www.baraha.com/

 Sincere apologies to the purists who might blow up like a volcano at either
 going to the Evil Google Lord for help, or Daring to use transliteration
 instead of the
 so-easy-to-use-and-learn-if-only-you-spend-a-whole-day-on-it-and-get-an-indic-script-keyboard-from-God-knows-where-because-everyone-is-well-off-and-supposed-to-be-living-in-a-well-connected-metro-like-me.


 If there is an open-source/cross-platform/creative commons/kumbayaah
 solution where we don't have to mug up what to do when we forget what we are
 supposed to have mugged up like the key combination for भ or त्र or ण or 
 ळinstead of just typing bh or tra or na or l and (if needed)
 backspacing twice to get a dropdown menu to choose what we truly want and
 moving on with our lives, or where we don't have to bend the laws of physics
 to get that elusive त्सा or perform computer साल्सा to have that split
 letter stuff on our screen then let's have it right here and right now or
 let's get our hands dirty and make'em for the love of the Lord instead of
 blasting the impure and corrupt Harijans who dare to take shortcuts for the
 sake of getting their work done on time.

 (Disclaimer : Only little offense meant with the hope to give a kick and
 create a demand for real open source solutions that can rival the private
 ones)


 Cheers,
 Nikhil Sheth
 +91-966-583-1250
 Pune, India
 Teach For India http://www.teachforindia.org/ Fellow, 2011-13
 www.nikhilsheth.tk
 Find me on: Twitter http://twitter.com/nikhiljs | 
 Facebookhttp://www.facebook.com/nikjs|
 LinkedIn http://in.linkedin.com/in/nikhiljs | Google
 http://www.google.com/profiles/nikhil.js| 
 RangDehttp://www.rangde.org/investor/nikhilsheth
 Join me on: Pune Documentary 
 Clubhttp://www.facebook.com/group.php?gid=138497769525636| Let's
 Do it Pune http://www.facebook.com/pages/Lets-do-it-Pune/103857326346659| 
 Toastmasters
 in Punehttp://www.facebook.com/pages/Toastmasters-in-Pune/148767611833746| 
 Wikipedia
 For Schools 
 projecthttp://education.wikia.com/wiki/Wikipedia_For_Schools_Offline_Edition


 ___
 Wikimediaindia-l mailing list
 Wikimediaindia-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Anivar Aravind
On 2/24/11, Nikhil Sheth nikhil...@gmail.com wrote:
 Great discussion, but I wonder why I didn't see any real, easy, doable,
 inexpensive, quickfix solution put forth that every Indian on the internet
 can begin using immediately to get around the Unicode Vs custom Fonts issue.

Hey,
What you are mentioning is just about Transliteration Input methods.
And there are 100's of such solutions , Phonetic Keyboards etc .
Transliteration keyboards existed years before google  most of the
solutions you pointed.
Take a look at Firefox extensions and m17n-db to get a feel of it.

The Discussion here was not only about Input methods. It is about
Encoding , Rendering  Fonts, which is the underlying technology which
enable input methods to work

Also just a friendly request to understand thread first before
knee-jerking with what you know

Anivar Aravind


 So here's some from me:

 1. Quick copy-paste, working with a net connection:
 http://www.google.com/transliterate/

 2. Put a bookmarklet/favorite in your browser to type in Indian language in
 any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html

 3. Get these languages installed in 5 mins on your machine so you can use it
 in any application from notepad to chat :
 http://www.google.com/ime/transliteration/ or sneak out the files for
 offline installation in your hometown using this neat hack:
 http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.html

 (I know our greatest angels won't care about this one because it only works
 on Evil Windows!)

 4. Indian made alternative both editor and input language:
 http://www.baraha.com/

 Sincere apologies to the purists who might blow up like a volcano at either
 going to the Evil Google Lord for help, or Daring to use transliteration
 instead of the
 so-easy-to-use-and-learn-if-only-you-spend-a-whole-day-on-it-and-get-an-indic-script-keyboard-from-God-knows-where-because-everyone-is-well-off-and-supposed-to-be-living-in-a-well-connected-metro-like-me.


 If there is an open-source/cross-platform/creative commons/kumbayaah
 solution where we don't have to mug up what to do when we forget what we are
 supposed to have mugged up like the key combination for भ or त्र or ण
 or ळinstead of just typing bh or tra or na or l and (if
 needed)
 backspacing twice to get a dropdown menu to choose what we truly want and
 moving on with our lives, or where we don't have to bend the laws of physics
 to get that elusive त्सा or perform computer साल्सा to have that split
 letter stuff on our screen then let's have it right here and right now or
 let's get our hands dirty and make'em for the love of the Lord instead of
 blasting the impure and corrupt Harijans who dare to take shortcuts for the
 sake of getting their work done on time.

 (Disclaimer : Only little offense meant with the hope to give a kick and
 create a demand for real open source solutions that can rival the private
 ones)


 Cheers,
 Nikhil Sheth
 +91-966-583-1250
 Pune, India
 Teach For India http://www.teachforindia.org/ Fellow, 2011-13
 www.nikhilsheth.tk
 Find me on: Twitter http://twitter.com/nikhiljs |
 Facebookhttp://www.facebook.com/nikjs|
 LinkedIn http://in.linkedin.com/in/nikhiljs | Google
 http://www.google.com/profiles/nikhil.js|
 RangDehttp://www.rangde.org/investor/nikhilsheth
 Join me on: Pune Documentary
 Clubhttp://www.facebook.com/group.php?gid=138497769525636| Let's
 Do it Pune http://www.facebook.com/pages/Lets-do-it-Pune/103857326346659 |
 Toastmasters in
 Punehttp://www.facebook.com/pages/Toastmasters-in-Pune/148767611833746|
 Wikipedia
 For Schools
 projecthttp://education.wikia.com/wiki/Wikipedia_For_Schools_Offline_Edition



-- 
[It is not] possible to distinguish between 'numerical' and
'nonnumerical' algorithms, as if numbers were somehow different from
other kinds of precise information. - Donald Knuth

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread sankarshan
On Thu, Feb 24, 2011 at 9:36 AM, Anivar Aravind
anivar.arav...@gmail.com wrote:

 The Discussion here was not only about Input methods. It is about
 Encoding , Rendering  Fonts, which is the underlying technology which
 enable input methods to work

 Also just a friendly request to understand thread first before
 knee-jerking with what you know

The discussion started off with Unicode (Gautam was the OP if I recall
correctly). And, then of course it has progressed into a discussion
about the various pieces that are complex or, are work-in-progress
towards a solution. Sometimes it isn't easy for everyone to see where
it is going. Doesn't necessarily mean that we cannot be excellent to
each other.


-- 
sankarshan mukhopadhyay
http://sankarshan.randomink.org/blog

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Gautam John
On 24 February 2011 09:15, Nikhil Sheth nikhil...@gmail.com wrote:

 Great discussion, but I wonder why I didn't see any real, easy, doable,
 inexpensive, quickfix solution put forth that every Indian on the internet
 can begin using immediately to get around the Unicode Vs custom Fonts issue.

Sure - it's great to see that there are multiple input methods, some
local and some on the Web that allow for Unicode encoded text but I
was actually coming at it from a legacy issue - there is tons of
'digital' content that is not accessible - how do we make it
accessible and there is a great hesitancy for certain verticals to use
Unicode on the basis of the 'lack of fonts' issue. I was trying to
build a case as to why Unicode is important and how we could increase
the diversity of available fonts.

Thank you.

Best,

Gautam

http://social.prathambooks.org/

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Gautam John
On 24 February 2011 09:21, sankarshan foss.mailingli...@gmail.com wrote:

 And, looking at all this discussion I now wish that I submitted a
 'state of Indic' paper at some conference happening currently ;)

Oh but you should! I would learn much from it and I am sure everyone
else will learn something too!

Thank you.

Best,

Gautam

http://social.prathambooks.org/

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Gautam John
Two things I meant to add:

1. The eGov standards body for India has recently notified Unicode
5.1.0 as the default standard for all eGov applications henceforth.
(Sadly, their website is DoA - http://egovstandards.gov.in/) I am
hopeful that this will be the start of some initiative within
Government and would, hopefully, spread.

A cache of their Approach Paper on Localization is here:

http://webcache.googleusercontent.com/search?q=cache:e28QCFBDI-cJ:egovstandards.gov.in/standards_localisation_app+india+egov+standards+unicodecd=2hl=enct=clnkgl=insource=www.google.co.in

And a cache their Character Encoding Standard For Indian Languages is here:

http://docs.google.com/viewer?a=vq=cache:dYxnM6D7IMQJ:egovstandards.gov.in/egscontent.2009-12-29.6248244073/at_download/file+india+egov+standards+unicodehl=engl=inpid=blsrcid=ADGEESgxDT6JyHRlgWfR2TKYHKRGeAM5PigxzZAPyo2M1d6rxGnOC3sQ0S5XVDVVvPL_t5ZKmui0ghMMO63q2hZMT_WeJq0WH5FnEFYFioh7EZ_Uzj8XPnvVMatGZ4vO9kv6RXJZM56esig=AHIEtbQnDd2Gy29vyy97FnvAw2g4hN3cqQ

2. On input methods - is there anything of a best practice or even a
Government notification about an input standard?

Thank you.

Best,

Gautam

http://social.prathambooks.org/

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Bala Jeyaraman
On input methods - is there anything of a best practice or even a
Government notification about an input standard?

In Tamil Nadu, the govt recommends and endorses the Tamil 99 keyboard
layout.

On Thu, Feb 24, 2011 at 10:12 AM, Shiju Alex shijualexonl...@gmail.comwrote:

 Even though Central Government has adopted Unicode as the encoding
 standard, the case is not the same with most State Governments. As far as I
 know only few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted
 Unicode standard. Many are still in the ASCII era.

 On input methods - is there anything of a best practice or even a

 Government notification about an input standard?


 I haven't seen any notification regarding this yet. But InScript is
 officially/unofficially adopted as the default input scheme. That is why it
 is part school syllabus in some states.



 On Thu, Feb 24, 2011 at 9:54 AM, Gautam John gau...@prathambooks.orgwrote:

 Two things I meant to add:

 1. The eGov standards body for India has recently notified Unicode
 5.1.0 as the default standard for all eGov applications henceforth.
 (Sadly, their website is DoA - http://egovstandards.gov.in/) I am
 hopeful that this will be the start of some initiative within
 Government and would, hopefully, spread.

 A cache of their Approach Paper on Localization is here:


 http://webcache.googleusercontent.com/search?q=cache:e28QCFBDI-cJ:egovstandards.gov.in/standards_localisation_app+india+egov+standards+unicodecd=2hl=enct=clnkgl=insource=www.google.co.in

 And a cache their Character Encoding Standard For Indian Languages is
 here:


 http://docs.google.com/viewer?a=vq=cache:dYxnM6D7IMQJ:egovstandards.gov.in/egscontent.2009-12-29.6248244073/at_download/file+india+egov+standards+unicodehl=engl=inpid=blsrcid=ADGEESgxDT6JyHRlgWfR2TKYHKRGeAM5PigxzZAPyo2M1d6rxGnOC3sQ0S5XVDVVvPL_t5ZKmui0ghMMO63q2hZMT_WeJq0WH5FnEFYFioh7EZ_Uzj8XPnvVMatGZ4vO9kv6RXJZM56esig=AHIEtbQnDd2Gy29vyy97FnvAw2g4hN3cqQ

 2. On input methods - is there anything of a best practice or even a
 Government notification about an input standard?

 Thank you.

 Best,

 Gautam
 
 http://social.prathambooks.org/

 ___
 Wikimediaindia-l mailing list
 Wikimediaindia-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l



 ___
 Wikimediaindia-l mailing list
 Wikimediaindia-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l




-- 
Beauty lies in the eyes of the beer holder
___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Gautam John
On 24 February 2011 10:12, Shiju Alex shijualexonl...@gmail.com wrote:

 Even though Central Government has adopted Unicode as the encoding standard,
 the case is not the same with most State Governments. As far as I know only
 few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted Unicode
 standard. Many are still in the ASCII era.

Thank you, Shiju. A question - what are the hesitancies for
Governments to move to Unicode as the encoding standards? Is it the
tools they use? The workflow? A legacy issue - we'll never be able to
open our old files?

I'm trying to map this space out - it's just that I am coming to see
it as being really really important and want to try and do something
here.

Also, the GoI is slowly making some noises about standards and
openness etc. and I am hoping this are small points that can add up.
For example, the TAGUP report:
http://finmin.nic.in/reports/TAGUP_Report.pdf

From the Executive Summary:

Chapter 6 points out some key design considerations for the solution
architecture. The solution architecture should be designed to be
flexible, reusable, extensible by stakeholders, and free of vendor
lock-in. Given that many Government projects touch end-users such as
citizens and firms, the Government should also play an active role in
promoting banking and accessibility for all. This can form the basis
of a platform for delivery of services.
Chapter 7 addresses openness in implementation of Government IT
projects. It describes the relevance of open standards, open data, and
open source. The Government should not only be a consumer, but also
strive to produce and facilitate open standards, open data, and open
source. It also suggests the creation of an open source foundation for
open sourcing software from Government projects.

Give me a little hope.

Best,

Gautam

http://social.prathambooks.org/

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread jayanta nath
In West Bengal there are no Govt announcement regarding Unicode and KB
layout.Our Govt are still in the ASCII era in all department. But they
adopted Unicode by *Society for Natural Language Technology Research* (NLTR)
(http://www.nltr.org/) and released Baishakhi Linux
2.0http://www.nltr.org/SNLTR/index.php?option=com_contenttask=viewid=118Itemid=119(inbuilt
unicode supported all Indic Language as like other Linux distro) .The
society has been seeded by the Govt. of West Bengal (Dept. of Information
Technology) with initial funding and support. NLTR promote Bengali computing
through Unicode and Baishakhi KB which is more similear as Inscript Bengali.

But my personal experience is not very good, when I go any govt office in
West Bengal( Writers'
Buildinghttp://en.wikipedia.org/wiki/Writers%27_Building),
they use Windows OS (pirated?), ASCII  Bengali interface like i-leap and
Bijoy etc. I dont know why they funded for Baishakhi Linux
2.0http://www.nltr.org/SNLTR/index.php?option=com_contenttask=viewid=118Itemid=119?

On Thu, Feb 24, 2011 at 10:18 AM, Gautam John gau...@prathambooks.orgwrote:

 On 24 February 2011 10:12, Shiju Alex shijualexonl...@gmail.com wrote:

  Even though Central Government has adopted Unicode as the encoding
 standard,
  the case is not the same with most State Governments. As far as I know
 only
  few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted Unicode
  standard. Many are still in the ASCII era.

 Thank you, Shiju. A question - what are the hesitancies for
 Governments to move to Unicode as the encoding standards? Is it the
 tools they use? The workflow? A legacy issue - we'll never be able to
 open our old files?

 I'm trying to map this space out - it's just that I am coming to see
 it as being really really important and want to try and do something
 here.

 Also, the GoI is slowly making some noises about standards and
 openness etc. and I am hoping this are small points that can add up.
 For example, the TAGUP report:
 http://finmin.nic.in/reports/TAGUP_Report.pdf

 From the Executive Summary:

 Chapter 6 points out some key design considerations for the solution
 architecture. The solution architecture should be designed to be
 flexible, reusable, extensible by stakeholders, and free of vendor
 lock-in. Given that many Government projects touch end-users such as
 citizens and firms, the Government should also play an active role in
 promoting banking and accessibility for all. This can form the basis
 of a platform for delivery of services.
 Chapter 7 addresses openness in implementation of Government IT
 projects. It describes the relevance of open standards, open data, and
 open source. The Government should not only be a consumer, but also
 strive to produce and facilitate open standards, open data, and open
 source. It also suggests the creation of an open source foundation for
 open sourcing software from Government projects.

 Give me a little hope.

 Best,

 Gautam
 
 http://social.prathambooks.org/

 ___
 Wikimediaindia-l mailing list
 Wikimediaindia-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l




-- 
With Warm Regards,
*Jayanta Nath*
Calcutta,West Bengal
+91 9836294438
Facebook :http://www.facebook.com/jayantanth
Wikipedia :http://en.wikipedia.org/wiki/User:Jayantanth
আসুন পাইরেসি মুক্ত ভারত  গড়ি,সবাই মুক্ত সফ্‌টওয়ার ব্যবহার করি [image:
O:-)],অন্যকে ব্যবহারে উৎসাহিত করি।
__

Wikimediaindia-l mailing list
wikimedia-in...@lists.wikimedia.org Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-in-wbhttps://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Anivar Aravind
Dear sankarshan

Initial license of raghu font series was confusing. But later they
changed it to gnu gpl, as per the insistance of RK joshi. Gnu gpl
licensed fonts were released as a part of Indix project of cdacmumbai.

Anivar

On 2/24/11, sankarshan foss.mailingli...@gmail.com wrote:
 On Thu, Feb 24, 2011 at 8:51 AM, Anivar Aravind
 anivar.arav...@gmail.com wrote:

 CDACMumbai have a history of GPL Licensing one font series as a part
 of their indix project  , Raghu Series, by Late. Prof. R.K.Joshi,
 Famous Calligrapher and Researcher in Type faces.
 http://en.wikipedia.org/wiki/R_K_Joshi
 Rebranding his Jana Series fonts to raghu series  GPLing them  was
 his long term effort from inside CDAC. But its font tables need to be
 corrected to make them usable .

 The 'GPL' that these fonts had was the 'General Public License' wasn't
 it ? And not the GNU General Public License. I may be mistaken though
 etc.

 I've been, in the past, known to berate and sigh C-DAC. In recent
 times I've arrived at the conclusion that there's no upside thinking
 that TDIL/MinIT/C-DAC will eventually figure out that selling services
 around their products make for a better business case than trying to
 hawk the products themselves. Or, that LGPL licensing their products
 might make it easier to have an application developer network around
 it.

 --
 sankarshan mukhopadhyay
 http://sankarshan.randomink.org/blog

 ___
 Wikimediaindia-l mailing list
 Wikimediaindia-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


-- 
Sent from my mobile device

[It is not] possible to distinguish between 'numerical' and
'nonnumerical' algorithms, as if numbers were somehow different from
other kinds of precise information. - Donald Knuth

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-23 Thread Gautam John
On 24 February 2011 11:36, Anivar Aravind anivar.arav...@gmail.com wrote:

 Thanks for those links. I am aware about that. But not get enough time
 to read it yet. But are you sure, it specified unicode 5.1 . I am
 curious becuase new rupee symbol getting encoded only in unicode 6.1.
 Usually govt standards does not specify versions.

Yep. What it states is:

Unicode shall be the storage-encoding standard for all
constitutionally recognised Indian

Languages including English and other global languages as follows:

Unicode 5.1.0 and its future up-gradation as reported by Unicode
consortium from time to time.

Thank you.

Best,

Gautam

http://social.prathambooks.org/

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-22 Thread Santhosh Thottingal
On Thu, Feb 17, 2011 at 11:29 AM, Gautam John gau...@prathambooks.org wrote:
 2. Given that we publish in Indian languages, using Unicode fonts are
 the only way to achieve cross-platform interoperability and is a
 global standard.
 3. Given India's push towards copyright reform for the print impaired,
 it is imperative that Unicode fonts be used in the creation of Indic
 content because it is otherwise a huge barrier to conversion to
 print-friendly formats.
 4. Unicode, being an open global standard guarantees content
 accessibility in the future and ensures no proprietary font and vendor
 lock in.

I think you have some confusion on Unicode and Fonts. Let me try to
clarify in simple words.
Unicode is an encoding standard. it says how a 'letter' is represented
by a group of bits or bytes. And it ensures a uniqueness for each of
the letters across thousands of languages in the world.
Fonts are just clothes for these data.  sometimes optimized for web,
sometimes for print. sometimes fancy... Data can exist without fonts
too. Only thing is one cannot see the data properly.or you see them
naked(as question marks, squares or raw code points depending on your
operating system environment)

So if you say 'using unicode fonts for indic content, it does not
make sense. we cannot represent or store data in fonts. or when you
say unicode fonts are the only way to achieve interoperability:, it
is wrong since it is encoding standard makes interoperability
possible.

Unicode data does not have dependency on the font. Font is users
choice and it is at readers side.

But I know that many people still use the term data in unicode
fonts, data in xyz font etc. This usage came into existence just
because,  before unicode was popular, most of the Indian publishers
used a non-standard way of representing our data- using English(or
latin -ascii)  data and change the font's 'face' to Indian glyph. a
fancy dress hack. The letter k will be shown as hindi ka with the
help of a font. ie the data is still english, but what you see is
Hindi.
Obviously the data  cannot be presented to anybody without this
special clothes. If you get this data and don't have the associated
font, what you see will be just some junk latin characters. Many
publishers created their own fonts with this technique in their own
way. So to send some data to your friend, you need to tell him that,
hey, this data is in Sree Font.. this data is in Kathika font etc.
Even after Unicode is popular, a very small percentage of publishers
moved to Unicode, and others still continue with ASCII font dependent
data.

If one uses Unicode,  no need to mention about font. One can read it
using a good unicode compatible font of his/her choice.

So data is in unicode encoding is correct. data is in unicode font
is wrong. data can be viewed using any unicode compatible font is
correct.
I hope it is clear.

 5. The limitation is on the lack of high quality and varied typefaces
 that are both screen and print optimised open type Indic Unicode
 fonts.

This is true. Fonts exist for all scripts ,  but the variety , or
quality of the existing fonts varies. Availability of fonts licensed
in foss compatible license is also a problem. For a detailed list of
Indic fonts with license info, see
http://indlinux.org/wiki/index.php/IndicFontsList


 6. Given the importance of linguistic diversity to India's cultural
 heritage, it is imperative that greater attention is paid to the
 development of such fonts under licenses that allow for free re-use
 and to fix issues in the fonts that might arise.

You are correct.  I would say fonts licensed under any FOSS license
instead of free use/reuse.

 7. The Govt. should fund the open development of at least 5 such fonts
 for each the 21 Constitutionally recognised languages and make these
 available not just for free, but under free license to re-use and
 improve as well.

You got it. But history shows that such funding did not play much role
in development of the fonts listed here:
http://indlinux.org/wiki/index.php/IndicFontsList
In fact, the funds were spent(read wasted) for the development of
Proprietary fonts by government agencies like CDAC. Fonts with
free(dom) licenses were developed and maintained by FOSS developer
communities.


Thanks
Santhosh Thottingal
http://thottingal.in

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-22 Thread Gautam John
On 22 February 2011 22:29, Santhosh Thottingal
santhosh.thottin...@gmail.com wrote:

 I think you have some confusion on Unicode and Fonts. Let me try to
 clarify in simple words.

Yes - I did! And thank you for such a detailed response.

To see if I have understood this - there are three components:

1. Input (Different types of keyboard layouts are used but are
independent of the method of encoding - correct?)
2. Encoding and storing the input (ASCII is the older method - have
heard of ISCII as well but do not know what that is but Unicode is the
standard.
3. Representing, visually for the human user, what has been inputed
and encoded. (Font or type faces and these are, to an extent,
independent of the encoding method used.)

 But I know that many people still use the term data in unicode
 fonts, data in xyz font etc. This usage came into existence just
 because,  before unicode was popular, most of the Indian publishers
 used a non-standard way of representing our data- using English(or
 latin -ascii)  data and change the font's 'face' to Indian glyph. a
 fancy dress hack. The letter k will be shown as hindi ka with the
 help of a font. ie the data is still english, but what you see is
 Hindi.

So if I understand correctly, not only is the encoding in ASCII but
the representation of that encoding is tied to a particular font (that
was used for representation at entry?) and will only be represented
properly when using that font? However, what I am trying to understand
is whether there is consistency across the ASCII encoding? Will ka in
Hindi be encoded in ASCII only one way or is there a linkage, that I
do not understand, to the font used to represent it as well?

The reason I ask is because if ka in Hindi is always encoded the same
way irrespective of the font used to represent it, then it should not
be hard to build an ASCII to Unicode map of encoding that will only
have to be done once for each language? Though something tells me I am
way off on this assumption.

 This is true. Fonts exist for all scripts ,  but the variety , or
 quality of the existing fonts varies. Availability of fonts licensed
 in foss compatible license is also a problem. For a detailed list of
 Indic fonts with license info, see
 http://indlinux.org/wiki/index.php/IndicFontsList

Thanks, Santosh. This is a really useful. Also, are these screen or
print ready fonts?

 You are correct.  I would say fonts licensed under any FOSS license
 instead of free use/reuse.

Indeed. FOSS license is what I should have said.

 In fact, the funds were spent(read wasted) for the development of
 Proprietary fonts by government agencies like CDAC. Fonts with
 free(dom) licenses were developed and maintained by FOSS developer
 communities.

*sigh* In your opinion, would they be any real benefit if they did
license the ILDC series under a true FOSS license?

 Each Unicode character is multi-byte character while in ASCII, it is
 single byte.

Ah. Okay. I understand now.

 This is not comparable since search is not possible in ascii font way
 of representing data. Since the data is not in Hindi , but we just
 see as Hindi, one cannot do a search or any such data processing on
 that data.

If I understand, it is not possible to search within ASCII encoded
text but this can be done in Unicode encoded text?

Thank you very much Santosh - I have learned a lot from this.

Best,

Gautam

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-22 Thread Anivar Aravind
On 2/22/11, Gautam John gau...@prathambooks.org wrote:
 On 22 February 2011 22:29, Santhosh Thottingal
 santhosh.thottin...@gmail.com wrote:

 I think you have some confusion on Unicode and Fonts. Let me try to
 clarify in simple words.

 Yes - I did! And thank you for such a detailed response.

 To see if I have understood this - there are three components:

 1. Input (Different types of keyboard layouts are used but are
 independent of the method of encoding - correct?)
 2. Encoding and storing the input (ASCII is the older method - have
 heard of ISCII as well but do not know what that is but Unicode is the
 standard.
 3. Representing, visually for the human user, what has been inputed
 and encoded. (Font or type faces and these are, to an extent,
 independent of the encoding method used.)

There are Four Components

1. Input Methods ( GOI approved Inscript layout, Various Popular
Layouts , Translitraton Keyboards, Phonetic Keyboards)
2. Encoding ( unicode)
3. Font (Opentype Fonts  ie. supporting Unicode)
4. Rendering Engines (this does the shaping of Complex Glyphs using
the Open type font table in Fonts . eg. Pango in Gnome, Harfbuzz in
KDE, ICU in Openoffice  java based programmes , Uniscribe in Windows
etc )


 But I know that many people still use the term data in unicode
 fonts, data in xyz font etc. This usage came into existence just
 because,  before unicode was popular, most of the Indian publishers
 used a non-standard way of representing our data- using English(or
 latin -ascii)  data and change the font's 'face' to Indian glyph. a
 fancy dress hack. The letter k will be shown as hindi ka with the
 help of a font. ie the data is still english, but what you see is
 Hindi.

 So if I understand correctly, not only is the encoding in ASCII but
 the representation of that encoding is tied to a particular font (that
 was used for representation at entry?) and will only be represented
 properly when using that font? However, what I am trying to understand
 is whether there is consistency across the ASCII encoding? Will ka in
 Hindi be encoded in ASCII only one way or is there a linkage, that I
 do not understand, to the font used to represent it as well?

ASCII is not like Unicode. It only understands latin, not any other
language. All over India, legacy, non-standard local language
technologies (ugly hacks) have gained deep roots. Local newspaper
websites as well as publishing houses seem to use their own
non-standard fonts. This means that documents and web sites get tied
to fonts. These fonts may or may not be freely available, and in some
extreme cases, may be no longer available at all. If you lose the
font, you lose the content as well.

Ka in Hindi may be mapped in the position of A in some font , in the
position of H in some other font as per the convenience of font
developer




 The reason I ask is because if ka in Hindi is always encoded the same
 way irrespective of the font used to represent it, then it should not
 be hard to build an ASCII to Unicode map of encoding that will only
 have to be done once for each language? Though something tells me I am
 way off on this assumption.

It is Font dependent. There is a need of Preparing Conversion maps for
each Ascii font to convert data encoded in them to unicode.
Swathanthra Malayalam Computing's Payyan's
(http://wiki.smc.org.in/Payyans ) is a tool developed for converting
ASCII to Unicode easily  for any Indic Language by building a Font map
for each needed font . This tool helped Malayalam Wiktionary to
convert many copyright expired books in non standard encodings to
Unicode

Popular Firefox extension named Padma uses similar encoding conversion
tables to display ASCII news websites in Unicode


 This is true. Fonts exist for all scripts ,  but the variety , or
 quality of the existing fonts varies. Availability of fonts licensed
 in foss compatible license is also a problem. For a detailed list of
 Indic fonts with license info, see
 http://indlinux.org/wiki/index.php/IndicFontsList

 Thanks, Santosh. This is a really useful. Also, are these screen or
 print ready fonts?

Each Language Communities can answer this question well. In Malayalam
we have both screen and print fonts, including one Ornamental font .


 You are correct.  I would say fonts licensed under any FOSS license
 instead of free use/reuse.

 Indeed. FOSS license is what I should have said.

 In fact, the funds were spent(read wasted) for the development of
 Proprietary fonts by government agencies like CDAC. Fonts with
 free(dom) licenses were developed and maintained by FOSS developer
 communities.

 *sigh* In your opinion, would they be any real benefit if they did
 license the ILDC series under a true FOSS license?

I dont think this will happen. There is a long history of lobbying for
thiswith CDAC from 2001 Onwards and nothing happened. CDAC made enough
money by selling ASCII fonts(and still makes) and They cant even think
about giving them away with a FOSS 

Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-16 Thread Gautam John
On 17 February 2011 11:29, Gautam John gau...@prathambooks.org wrote:

 I'm trying to bring together some ideas as to why Unicode is
 important, what the upsides and downsides are. My initial thoughts:

A few other points that I read here: http://anandabazar-unicode.appspot.com/

Data usage: Use of Unicode will significantly reduce bandwidth/storage
Search (within a page/web search etc.)

Thank you.

Best,

Gautam

http://social.prathambooks.org/

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l


Re: [Wikimediaindia-l] (OT) On the importance of Unicode

2011-02-16 Thread Gautam John
On 17 February 2011 12:35, BalaSundaraRaman sundarbe...@yahoo.com wrote:

 I have some points to share, but got to go back to work now.
 Can I get back on this later?

Sure, Sundar! No hurry.

Thank you.

Best,

Gautam

http://social.prathambooks.org/

___
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l