Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 2/23/11, Gautam John gau...@prathambooks.org wrote: Dear Anivar: There are Four Components Thanks for the addendum - how important is the rendering engine in the scheme of things? Is work on that pretty much done or are there issues there too? If your language have some errors in Complex Glyph formation, it is a rendering engine issue. You can find more here http://en.wikipedia.org/wiki/Wikipedia:Enabling_complex_text_support_for_Indic_scripts Rendering Engines like Pango evolved through more than 10 years of patching correction by language communities. It work Pretty well in most of the indic languages. Harfbuzz(http://www.freedesktop.org/wiki/Software/HarfBuzz) is relatively new player in the field by taking code from Pango QT ICU . Harfuzz-ng is used in new Firefox 4 as its default Rendering engine .Uniscribe engine in Windows based systems started supporting Indic fonts from Windows XP SP2 onwards. Let me give an example for why Rendering engine is important. Now For latin script wiki's there is PDF download option Pediapress to print them directly But Such Options are not available for Non Latin wikis Character Rendering is a the block here. Pedia press's library fails to render non latin content , because the library they use is not making use of rendering engines. If a teacher went to internet cafe for reading a wikipedia entry in indian language , she must ensure following things before reading/printing articles 1. ensure the Operating system have Indic support 2. Ensure It have a font to display content correctly 3. Browser renders well Then only she can read it/ print it in human readable form. If there is PDF export facility with server side rendering , it was so easy for her to to take it /print it for students. Sometime back Santhosh Posted his project Pypdflib for testing in this list. It is a library for rendering PDF from Indic language wiki pages . It uses functionality of pango for generating PDF In short Rendering is a major roadblock in reaching wikipedia to masses. The projects like santhosh's effort are very important to fill this gap. It is Font dependent. There is a need of Preparing Conversion maps for each Ascii font to convert data encoded in them to unicode. Swathanthra Malayalam Computing's Payyan's (http://wiki.smc.org.in/Payyans ) is a tool developed for converting ASCII to Unicode easily for any Indic Language by building a Font map for each needed font . This tool helped Malayalam Wiktionary to convert many copyright expired books in non standard encodings to Unicode Popular Firefox extension named Padma uses similar encoding conversion tables to display ASCII news websites in Unicode So how do these work? They have built a map for every single ASCII encoding/font pair (since this is some ugly hack) and the corresponding Unicode value? Yes. payyan's wikipage have an Howto for creating fontmaps There must be thousands of ASCII encoding/font pairs right? Is this even a viable option? Are there alternatives to this? This is the only viable option as of now. Most of the languages have around 10-20 popular fonts . Creating Mapping tables for them is anyway a big task . But if each language communities are contributing, it is not a big task. And Padma project has done mapping of many news website fonts already through the contributions of many people. There is no other free alternative . BTW Document Conversion is a big business and many corporates are working on this area to provide solutions for companies governments I dont think this will happen. There is a long history of lobbying for thiswith CDAC from 2001 Onwards and nothing happened. CDAC made enough money by selling ASCII fonts(and still makes) and They cant even think about giving them away with a FOSS License . And during frequent terms they eat more government money for making yet another CD to ship with their FOSS project forks (such ad Bhaathiya OO , IndiFox etc )+ These fonts. In the same way most of the TDIL funding to CDAC for Indic Language technology research does not make output at all or not getting released, even after TDIL's policy decision to release them under a foss license. I can see the frustration of this - so in your opinion, an effort not worth undertaking? Assuming they were ready to use a FOSS license, are the fonts good enough to want to use? In my opinion, Efforts on this will be waste of time money .I dont believe in miracles with CDAC. CDACMumbai have a history of GPL Licensing one font series as a part of their indix project , Raghu Series, by Late. Prof. R.K.Joshi, Famous Calligrapher and Researcher in Type faces. http://en.wikipedia.org/wiki/R_K_Joshi Rebranding his Jana Series fonts to raghu series GPLing them was his long term effort from inside CDAC. But its font tables need to be corrected to make them usable . We did this work for malayalam and Raghu-Malayalam is currently maintained by SMC. Anyway it is an
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue. So here's some from me: 1. Quick copy-paste, working with a net connection: http://www.google.com/transliterate/ 2. Put a bookmarklet/favorite in your browser to type in Indian language in any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html 3. Get these languages installed in 5 mins on your machine so you can use it in any application from notepad to chat : http://www.google.com/ime/transliteration/ or sneak out the files for offline installation in your hometown using this neat hack: http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.html (I know our greatest angels won't care about this one because it only works on Evil Windows!) 4. Indian made alternative both editor and input language: http://www.baraha.com/ Sincere apologies to the purists who might blow up like a volcano at either going to the Evil Google Lord for help, or Daring to use transliteration instead of the so-easy-to-use-and-learn-if-only-you-spend-a-whole-day-on-it-and-get-an-indic-script-keyboard-from-God-knows-where-because-everyone-is-well-off-and-supposed-to-be-living-in-a-well-connected-metro-like-me. If there is an open-source/cross-platform/creative commons/kumbayaah solution where we don't have to mug up what to do when we forget what we are supposed to have mugged up like the key combination for भ or त्र or ण or ळinstead of just typing bh or tra or na or l and (if needed) backspacing twice to get a dropdown menu to choose what we truly want and moving on with our lives, or where we don't have to bend the laws of physics to get that elusive त्सा or perform computer साल्सा to have that split letter stuff on our screen then let's have it right here and right now or let's get our hands dirty and make'em for the love of the Lord instead of blasting the impure and corrupt Harijans who dare to take shortcuts for the sake of getting their work done on time. (Disclaimer : Only little offense meant with the hope to give a kick and create a demand for real open source solutions that can rival the private ones) Cheers, Nikhil Sheth +91-966-583-1250 Pune, India Teach For India http://www.teachforindia.org/ Fellow, 2011-13 www.nikhilsheth.tk Find me on: Twitter http://twitter.com/nikhiljs | Facebookhttp://www.facebook.com/nikjs| LinkedIn http://in.linkedin.com/in/nikhiljs | Google http://www.google.com/profiles/nikhil.js| RangDehttp://www.rangde.org/investor/nikhilsheth Join me on: Pune Documentary Clubhttp://www.facebook.com/group.php?gid=138497769525636| Let's Do it Pune http://www.facebook.com/pages/Lets-do-it-Pune/103857326346659 | Toastmasters in Punehttp://www.facebook.com/pages/Toastmasters-in-Pune/148767611833746| Wikipedia For Schools projecthttp://education.wikia.com/wiki/Wikipedia_For_Schools_Offline_Edition ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On Thu, Feb 24, 2011 at 9:15 AM, Nikhil Sheth nikhil...@gmail.com wrote: Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue. So here's some from me: 1. Quick copy-paste, working with a net connection: http://www.google.com/transliterate/ 2. Put a bookmarklet/favorite in your browser to type in Indian language in any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html 3. Get these languages installed in 5 mins on your machine so you can use it in any application from notepad to chat : http://www.google.com/ime/transliteration/ or sneak out the files for offline installation in your hometown using this neat hack: http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.html (I know our greatest angels won't care about this one because it only works on Evil Windows!) 4. Indian made alternative both editor and input language: http://www.baraha.com/ Getting things fixed at the 'plumbing' level is a hard climb but it is worth it since it would also ensure that offline devices can utilize what is technically correct (note, that this does not necessarily imply that the above choices are 'incorrect'). Doing it using web technologies is one thing, doing it for the desktop, especially the offline-desktop is another part of the same coin. We have come a long way since the days when one needed a recompiled Pango (the renderer) to even decently render Indic or, when input methods were flaky. Using standards and developing code pieces that comply with those standards make it easier for platforms across the spectrum to do Indic (and, other complex scripts) well. And, looking at all this discussion I now wish that I submitted a 'state of Indic' paper at some conference happening currently ;) -- sankarshan mukhopadhyay http://sankarshan.randomink.org/blog ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
This discussion is not at all about input methods. I do not know why a sudden comparison between transliteration vs. InScript came here. Looking at all the solutions you provided, let me ask one thing. Have you really actively contributed/contributing to any Indian language wikipedia. A survey on the input methods used by Indian wikipedians will give a different answer. Shiju On Thu, Feb 24, 2011 at 9:15 AM, Nikhil Sheth nikhil...@gmail.com wrote: Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue. So here's some from me: 1. Quick copy-paste, working with a net connection: http://www.google.com/transliterate/ 2. Put a bookmarklet/favorite in your browser to type in Indian language in any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html 3. Get these languages installed in 5 mins on your machine so you can use it in any application from notepad to chat : http://www.google.com/ime/transliteration/ or sneak out the files for offline installation in your hometown using this neat hack: http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.html (I know our greatest angels won't care about this one because it only works on Evil Windows!) 4. Indian made alternative both editor and input language: http://www.baraha.com/ Sincere apologies to the purists who might blow up like a volcano at either going to the Evil Google Lord for help, or Daring to use transliteration instead of the so-easy-to-use-and-learn-if-only-you-spend-a-whole-day-on-it-and-get-an-indic-script-keyboard-from-God-knows-where-because-everyone-is-well-off-and-supposed-to-be-living-in-a-well-connected-metro-like-me. If there is an open-source/cross-platform/creative commons/kumbayaah solution where we don't have to mug up what to do when we forget what we are supposed to have mugged up like the key combination for भ or त्र or ण or ळinstead of just typing bh or tra or na or l and (if needed) backspacing twice to get a dropdown menu to choose what we truly want and moving on with our lives, or where we don't have to bend the laws of physics to get that elusive त्सा or perform computer साल्सा to have that split letter stuff on our screen then let's have it right here and right now or let's get our hands dirty and make'em for the love of the Lord instead of blasting the impure and corrupt Harijans who dare to take shortcuts for the sake of getting their work done on time. (Disclaimer : Only little offense meant with the hope to give a kick and create a demand for real open source solutions that can rival the private ones) Cheers, Nikhil Sheth +91-966-583-1250 Pune, India Teach For India http://www.teachforindia.org/ Fellow, 2011-13 www.nikhilsheth.tk Find me on: Twitter http://twitter.com/nikhiljs | Facebookhttp://www.facebook.com/nikjs| LinkedIn http://in.linkedin.com/in/nikhiljs | Google http://www.google.com/profiles/nikhil.js| RangDehttp://www.rangde.org/investor/nikhilsheth Join me on: Pune Documentary Clubhttp://www.facebook.com/group.php?gid=138497769525636| Let's Do it Pune http://www.facebook.com/pages/Lets-do-it-Pune/103857326346659| Toastmasters in Punehttp://www.facebook.com/pages/Toastmasters-in-Pune/148767611833746| Wikipedia For Schools projecthttp://education.wikia.com/wiki/Wikipedia_For_Schools_Offline_Edition ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 2/24/11, Nikhil Sheth nikhil...@gmail.com wrote: Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue. Hey, What you are mentioning is just about Transliteration Input methods. And there are 100's of such solutions , Phonetic Keyboards etc . Transliteration keyboards existed years before google most of the solutions you pointed. Take a look at Firefox extensions and m17n-db to get a feel of it. The Discussion here was not only about Input methods. It is about Encoding , Rendering Fonts, which is the underlying technology which enable input methods to work Also just a friendly request to understand thread first before knee-jerking with what you know Anivar Aravind So here's some from me: 1. Quick copy-paste, working with a net connection: http://www.google.com/transliterate/ 2. Put a bookmarklet/favorite in your browser to type in Indian language in any site. इधर भी : http://t13n.googlecode.com/svn/trunk/blet/docs/help.html 3. Get these languages installed in 5 mins on your machine so you can use it in any application from notepad to chat : http://www.google.com/ime/transliteration/ or sneak out the files for offline installation in your hometown using this neat hack: http://visibleblog.blogspot.com/2010/07/google-transliteration-ime-offline.html (I know our greatest angels won't care about this one because it only works on Evil Windows!) 4. Indian made alternative both editor and input language: http://www.baraha.com/ Sincere apologies to the purists who might blow up like a volcano at either going to the Evil Google Lord for help, or Daring to use transliteration instead of the so-easy-to-use-and-learn-if-only-you-spend-a-whole-day-on-it-and-get-an-indic-script-keyboard-from-God-knows-where-because-everyone-is-well-off-and-supposed-to-be-living-in-a-well-connected-metro-like-me. If there is an open-source/cross-platform/creative commons/kumbayaah solution where we don't have to mug up what to do when we forget what we are supposed to have mugged up like the key combination for भ or त्र or ण or ळinstead of just typing bh or tra or na or l and (if needed) backspacing twice to get a dropdown menu to choose what we truly want and moving on with our lives, or where we don't have to bend the laws of physics to get that elusive त्सा or perform computer साल्सा to have that split letter stuff on our screen then let's have it right here and right now or let's get our hands dirty and make'em for the love of the Lord instead of blasting the impure and corrupt Harijans who dare to take shortcuts for the sake of getting their work done on time. (Disclaimer : Only little offense meant with the hope to give a kick and create a demand for real open source solutions that can rival the private ones) Cheers, Nikhil Sheth +91-966-583-1250 Pune, India Teach For India http://www.teachforindia.org/ Fellow, 2011-13 www.nikhilsheth.tk Find me on: Twitter http://twitter.com/nikhiljs | Facebookhttp://www.facebook.com/nikjs| LinkedIn http://in.linkedin.com/in/nikhiljs | Google http://www.google.com/profiles/nikhil.js| RangDehttp://www.rangde.org/investor/nikhilsheth Join me on: Pune Documentary Clubhttp://www.facebook.com/group.php?gid=138497769525636| Let's Do it Pune http://www.facebook.com/pages/Lets-do-it-Pune/103857326346659 | Toastmasters in Punehttp://www.facebook.com/pages/Toastmasters-in-Pune/148767611833746| Wikipedia For Schools projecthttp://education.wikia.com/wiki/Wikipedia_For_Schools_Offline_Edition -- [It is not] possible to distinguish between 'numerical' and 'nonnumerical' algorithms, as if numbers were somehow different from other kinds of precise information. - Donald Knuth ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On Thu, Feb 24, 2011 at 9:36 AM, Anivar Aravind anivar.arav...@gmail.com wrote: The Discussion here was not only about Input methods. It is about Encoding , Rendering Fonts, which is the underlying technology which enable input methods to work Also just a friendly request to understand thread first before knee-jerking with what you know The discussion started off with Unicode (Gautam was the OP if I recall correctly). And, then of course it has progressed into a discussion about the various pieces that are complex or, are work-in-progress towards a solution. Sometimes it isn't easy for everyone to see where it is going. Doesn't necessarily mean that we cannot be excellent to each other. -- sankarshan mukhopadhyay http://sankarshan.randomink.org/blog ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 24 February 2011 09:15, Nikhil Sheth nikhil...@gmail.com wrote: Great discussion, but I wonder why I didn't see any real, easy, doable, inexpensive, quickfix solution put forth that every Indian on the internet can begin using immediately to get around the Unicode Vs custom Fonts issue. Sure - it's great to see that there are multiple input methods, some local and some on the Web that allow for Unicode encoded text but I was actually coming at it from a legacy issue - there is tons of 'digital' content that is not accessible - how do we make it accessible and there is a great hesitancy for certain verticals to use Unicode on the basis of the 'lack of fonts' issue. I was trying to build a case as to why Unicode is important and how we could increase the diversity of available fonts. Thank you. Best, Gautam http://social.prathambooks.org/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 24 February 2011 09:21, sankarshan foss.mailingli...@gmail.com wrote: And, looking at all this discussion I now wish that I submitted a 'state of Indic' paper at some conference happening currently ;) Oh but you should! I would learn much from it and I am sure everyone else will learn something too! Thank you. Best, Gautam http://social.prathambooks.org/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
Two things I meant to add: 1. The eGov standards body for India has recently notified Unicode 5.1.0 as the default standard for all eGov applications henceforth. (Sadly, their website is DoA - http://egovstandards.gov.in/) I am hopeful that this will be the start of some initiative within Government and would, hopefully, spread. A cache of their Approach Paper on Localization is here: http://webcache.googleusercontent.com/search?q=cache:e28QCFBDI-cJ:egovstandards.gov.in/standards_localisation_app+india+egov+standards+unicodecd=2hl=enct=clnkgl=insource=www.google.co.in And a cache their Character Encoding Standard For Indian Languages is here: http://docs.google.com/viewer?a=vq=cache:dYxnM6D7IMQJ:egovstandards.gov.in/egscontent.2009-12-29.6248244073/at_download/file+india+egov+standards+unicodehl=engl=inpid=blsrcid=ADGEESgxDT6JyHRlgWfR2TKYHKRGeAM5PigxzZAPyo2M1d6rxGnOC3sQ0S5XVDVVvPL_t5ZKmui0ghMMO63q2hZMT_WeJq0WH5FnEFYFioh7EZ_Uzj8XPnvVMatGZ4vO9kv6RXJZM56esig=AHIEtbQnDd2Gy29vyy97FnvAw2g4hN3cqQ 2. On input methods - is there anything of a best practice or even a Government notification about an input standard? Thank you. Best, Gautam http://social.prathambooks.org/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On input methods - is there anything of a best practice or even a Government notification about an input standard? In Tamil Nadu, the govt recommends and endorses the Tamil 99 keyboard layout. On Thu, Feb 24, 2011 at 10:12 AM, Shiju Alex shijualexonl...@gmail.comwrote: Even though Central Government has adopted Unicode as the encoding standard, the case is not the same with most State Governments. As far as I know only few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted Unicode standard. Many are still in the ASCII era. On input methods - is there anything of a best practice or even a Government notification about an input standard? I haven't seen any notification regarding this yet. But InScript is officially/unofficially adopted as the default input scheme. That is why it is part school syllabus in some states. On Thu, Feb 24, 2011 at 9:54 AM, Gautam John gau...@prathambooks.orgwrote: Two things I meant to add: 1. The eGov standards body for India has recently notified Unicode 5.1.0 as the default standard for all eGov applications henceforth. (Sadly, their website is DoA - http://egovstandards.gov.in/) I am hopeful that this will be the start of some initiative within Government and would, hopefully, spread. A cache of their Approach Paper on Localization is here: http://webcache.googleusercontent.com/search?q=cache:e28QCFBDI-cJ:egovstandards.gov.in/standards_localisation_app+india+egov+standards+unicodecd=2hl=enct=clnkgl=insource=www.google.co.in And a cache their Character Encoding Standard For Indian Languages is here: http://docs.google.com/viewer?a=vq=cache:dYxnM6D7IMQJ:egovstandards.gov.in/egscontent.2009-12-29.6248244073/at_download/file+india+egov+standards+unicodehl=engl=inpid=blsrcid=ADGEESgxDT6JyHRlgWfR2TKYHKRGeAM5PigxzZAPyo2M1d6rxGnOC3sQ0S5XVDVVvPL_t5ZKmui0ghMMO63q2hZMT_WeJq0WH5FnEFYFioh7EZ_Uzj8XPnvVMatGZ4vO9kv6RXJZM56esig=AHIEtbQnDd2Gy29vyy97FnvAw2g4hN3cqQ 2. On input methods - is there anything of a best practice or even a Government notification about an input standard? Thank you. Best, Gautam http://social.prathambooks.org/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Beauty lies in the eyes of the beer holder ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 24 February 2011 10:12, Shiju Alex shijualexonl...@gmail.com wrote: Even though Central Government has adopted Unicode as the encoding standard, the case is not the same with most State Governments. As far as I know only few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted Unicode standard. Many are still in the ASCII era. Thank you, Shiju. A question - what are the hesitancies for Governments to move to Unicode as the encoding standards? Is it the tools they use? The workflow? A legacy issue - we'll never be able to open our old files? I'm trying to map this space out - it's just that I am coming to see it as being really really important and want to try and do something here. Also, the GoI is slowly making some noises about standards and openness etc. and I am hoping this are small points that can add up. For example, the TAGUP report: http://finmin.nic.in/reports/TAGUP_Report.pdf From the Executive Summary: Chapter 6 points out some key design considerations for the solution architecture. The solution architecture should be designed to be flexible, reusable, extensible by stakeholders, and free of vendor lock-in. Given that many Government projects touch end-users such as citizens and firms, the Government should also play an active role in promoting banking and accessibility for all. This can form the basis of a platform for delivery of services. Chapter 7 addresses openness in implementation of Government IT projects. It describes the relevance of open standards, open data, and open source. The Government should not only be a consumer, but also strive to produce and facilitate open standards, open data, and open source. It also suggests the creation of an open source foundation for open sourcing software from Government projects. Give me a little hope. Best, Gautam http://social.prathambooks.org/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
In West Bengal there are no Govt announcement regarding Unicode and KB layout.Our Govt are still in the ASCII era in all department. But they adopted Unicode by *Society for Natural Language Technology Research* (NLTR) (http://www.nltr.org/) and released Baishakhi Linux 2.0http://www.nltr.org/SNLTR/index.php?option=com_contenttask=viewid=118Itemid=119(inbuilt unicode supported all Indic Language as like other Linux distro) .The society has been seeded by the Govt. of West Bengal (Dept. of Information Technology) with initial funding and support. NLTR promote Bengali computing through Unicode and Baishakhi KB which is more similear as Inscript Bengali. But my personal experience is not very good, when I go any govt office in West Bengal( Writers' Buildinghttp://en.wikipedia.org/wiki/Writers%27_Building), they use Windows OS (pirated?), ASCII Bengali interface like i-leap and Bijoy etc. I dont know why they funded for Baishakhi Linux 2.0http://www.nltr.org/SNLTR/index.php?option=com_contenttask=viewid=118Itemid=119? On Thu, Feb 24, 2011 at 10:18 AM, Gautam John gau...@prathambooks.orgwrote: On 24 February 2011 10:12, Shiju Alex shijualexonl...@gmail.com wrote: Even though Central Government has adopted Unicode as the encoding standard, the case is not the same with most State Governments. As far as I know only few state goverments (Tamil Nadu, Punjab, Kerala,...) had adopted Unicode standard. Many are still in the ASCII era. Thank you, Shiju. A question - what are the hesitancies for Governments to move to Unicode as the encoding standards? Is it the tools they use? The workflow? A legacy issue - we'll never be able to open our old files? I'm trying to map this space out - it's just that I am coming to see it as being really really important and want to try and do something here. Also, the GoI is slowly making some noises about standards and openness etc. and I am hoping this are small points that can add up. For example, the TAGUP report: http://finmin.nic.in/reports/TAGUP_Report.pdf From the Executive Summary: Chapter 6 points out some key design considerations for the solution architecture. The solution architecture should be designed to be flexible, reusable, extensible by stakeholders, and free of vendor lock-in. Given that many Government projects touch end-users such as citizens and firms, the Government should also play an active role in promoting banking and accessibility for all. This can form the basis of a platform for delivery of services. Chapter 7 addresses openness in implementation of Government IT projects. It describes the relevance of open standards, open data, and open source. The Government should not only be a consumer, but also strive to produce and facilitate open standards, open data, and open source. It also suggests the creation of an open source foundation for open sourcing software from Government projects. Give me a little hope. Best, Gautam http://social.prathambooks.org/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- With Warm Regards, *Jayanta Nath* Calcutta,West Bengal +91 9836294438 Facebook :http://www.facebook.com/jayantanth Wikipedia :http://en.wikipedia.org/wiki/User:Jayantanth আসুন পাইরেসি মুক্ত ভারত গড়ি,সবাই মুক্ত সফ্টওয়ার ব্যবহার করি [image: O:-)],অন্যকে ব্যবহারে উৎসাহিত করি। __ Wikimediaindia-l mailing list wikimedia-in...@lists.wikimedia.org Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-in-wbhttps://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
Dear sankarshan Initial license of raghu font series was confusing. But later they changed it to gnu gpl, as per the insistance of RK joshi. Gnu gpl licensed fonts were released as a part of Indix project of cdacmumbai. Anivar On 2/24/11, sankarshan foss.mailingli...@gmail.com wrote: On Thu, Feb 24, 2011 at 8:51 AM, Anivar Aravind anivar.arav...@gmail.com wrote: CDACMumbai have a history of GPL Licensing one font series as a part of their indix project , Raghu Series, by Late. Prof. R.K.Joshi, Famous Calligrapher and Researcher in Type faces. http://en.wikipedia.org/wiki/R_K_Joshi Rebranding his Jana Series fonts to raghu series GPLing them was his long term effort from inside CDAC. But its font tables need to be corrected to make them usable . The 'GPL' that these fonts had was the 'General Public License' wasn't it ? And not the GNU General Public License. I may be mistaken though etc. I've been, in the past, known to berate and sigh C-DAC. In recent times I've arrived at the conclusion that there's no upside thinking that TDIL/MinIT/C-DAC will eventually figure out that selling services around their products make for a better business case than trying to hawk the products themselves. Or, that LGPL licensing their products might make it easier to have an application developer network around it. -- sankarshan mukhopadhyay http://sankarshan.randomink.org/blog ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Sent from my mobile device [It is not] possible to distinguish between 'numerical' and 'nonnumerical' algorithms, as if numbers were somehow different from other kinds of precise information. - Donald Knuth ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 24 February 2011 11:36, Anivar Aravind anivar.arav...@gmail.com wrote: Thanks for those links. I am aware about that. But not get enough time to read it yet. But are you sure, it specified unicode 5.1 . I am curious becuase new rupee symbol getting encoded only in unicode 6.1. Usually govt standards does not specify versions. Yep. What it states is: Unicode shall be the storage-encoding standard for all constitutionally recognised Indian Languages including English and other global languages as follows: Unicode 5.1.0 and its future up-gradation as reported by Unicode consortium from time to time. Thank you. Best, Gautam http://social.prathambooks.org/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On Thu, Feb 17, 2011 at 11:29 AM, Gautam John gau...@prathambooks.org wrote: 2. Given that we publish in Indian languages, using Unicode fonts are the only way to achieve cross-platform interoperability and is a global standard. 3. Given India's push towards copyright reform for the print impaired, it is imperative that Unicode fonts be used in the creation of Indic content because it is otherwise a huge barrier to conversion to print-friendly formats. 4. Unicode, being an open global standard guarantees content accessibility in the future and ensures no proprietary font and vendor lock in. I think you have some confusion on Unicode and Fonts. Let me try to clarify in simple words. Unicode is an encoding standard. it says how a 'letter' is represented by a group of bits or bytes. And it ensures a uniqueness for each of the letters across thousands of languages in the world. Fonts are just clothes for these data. sometimes optimized for web, sometimes for print. sometimes fancy... Data can exist without fonts too. Only thing is one cannot see the data properly.or you see them naked(as question marks, squares or raw code points depending on your operating system environment) So if you say 'using unicode fonts for indic content, it does not make sense. we cannot represent or store data in fonts. or when you say unicode fonts are the only way to achieve interoperability:, it is wrong since it is encoding standard makes interoperability possible. Unicode data does not have dependency on the font. Font is users choice and it is at readers side. But I know that many people still use the term data in unicode fonts, data in xyz font etc. This usage came into existence just because, before unicode was popular, most of the Indian publishers used a non-standard way of representing our data- using English(or latin -ascii) data and change the font's 'face' to Indian glyph. a fancy dress hack. The letter k will be shown as hindi ka with the help of a font. ie the data is still english, but what you see is Hindi. Obviously the data cannot be presented to anybody without this special clothes. If you get this data and don't have the associated font, what you see will be just some junk latin characters. Many publishers created their own fonts with this technique in their own way. So to send some data to your friend, you need to tell him that, hey, this data is in Sree Font.. this data is in Kathika font etc. Even after Unicode is popular, a very small percentage of publishers moved to Unicode, and others still continue with ASCII font dependent data. If one uses Unicode, no need to mention about font. One can read it using a good unicode compatible font of his/her choice. So data is in unicode encoding is correct. data is in unicode font is wrong. data can be viewed using any unicode compatible font is correct. I hope it is clear. 5. The limitation is on the lack of high quality and varied typefaces that are both screen and print optimised open type Indic Unicode fonts. This is true. Fonts exist for all scripts , but the variety , or quality of the existing fonts varies. Availability of fonts licensed in foss compatible license is also a problem. For a detailed list of Indic fonts with license info, see http://indlinux.org/wiki/index.php/IndicFontsList 6. Given the importance of linguistic diversity to India's cultural heritage, it is imperative that greater attention is paid to the development of such fonts under licenses that allow for free re-use and to fix issues in the fonts that might arise. You are correct. I would say fonts licensed under any FOSS license instead of free use/reuse. 7. The Govt. should fund the open development of at least 5 such fonts for each the 21 Constitutionally recognised languages and make these available not just for free, but under free license to re-use and improve as well. You got it. But history shows that such funding did not play much role in development of the fonts listed here: http://indlinux.org/wiki/index.php/IndicFontsList In fact, the funds were spent(read wasted) for the development of Proprietary fonts by government agencies like CDAC. Fonts with free(dom) licenses were developed and maintained by FOSS developer communities. Thanks Santhosh Thottingal http://thottingal.in ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 22 February 2011 22:29, Santhosh Thottingal santhosh.thottin...@gmail.com wrote: I think you have some confusion on Unicode and Fonts. Let me try to clarify in simple words. Yes - I did! And thank you for such a detailed response. To see if I have understood this - there are three components: 1. Input (Different types of keyboard layouts are used but are independent of the method of encoding - correct?) 2. Encoding and storing the input (ASCII is the older method - have heard of ISCII as well but do not know what that is but Unicode is the standard. 3. Representing, visually for the human user, what has been inputed and encoded. (Font or type faces and these are, to an extent, independent of the encoding method used.) But I know that many people still use the term data in unicode fonts, data in xyz font etc. This usage came into existence just because, before unicode was popular, most of the Indian publishers used a non-standard way of representing our data- using English(or latin -ascii) data and change the font's 'face' to Indian glyph. a fancy dress hack. The letter k will be shown as hindi ka with the help of a font. ie the data is still english, but what you see is Hindi. So if I understand correctly, not only is the encoding in ASCII but the representation of that encoding is tied to a particular font (that was used for representation at entry?) and will only be represented properly when using that font? However, what I am trying to understand is whether there is consistency across the ASCII encoding? Will ka in Hindi be encoded in ASCII only one way or is there a linkage, that I do not understand, to the font used to represent it as well? The reason I ask is because if ka in Hindi is always encoded the same way irrespective of the font used to represent it, then it should not be hard to build an ASCII to Unicode map of encoding that will only have to be done once for each language? Though something tells me I am way off on this assumption. This is true. Fonts exist for all scripts , but the variety , or quality of the existing fonts varies. Availability of fonts licensed in foss compatible license is also a problem. For a detailed list of Indic fonts with license info, see http://indlinux.org/wiki/index.php/IndicFontsList Thanks, Santosh. This is a really useful. Also, are these screen or print ready fonts? You are correct. I would say fonts licensed under any FOSS license instead of free use/reuse. Indeed. FOSS license is what I should have said. In fact, the funds were spent(read wasted) for the development of Proprietary fonts by government agencies like CDAC. Fonts with free(dom) licenses were developed and maintained by FOSS developer communities. *sigh* In your opinion, would they be any real benefit if they did license the ILDC series under a true FOSS license? Each Unicode character is multi-byte character while in ASCII, it is single byte. Ah. Okay. I understand now. This is not comparable since search is not possible in ascii font way of representing data. Since the data is not in Hindi , but we just see as Hindi, one cannot do a search or any such data processing on that data. If I understand, it is not possible to search within ASCII encoded text but this can be done in Unicode encoded text? Thank you very much Santosh - I have learned a lot from this. Best, Gautam ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 2/22/11, Gautam John gau...@prathambooks.org wrote: On 22 February 2011 22:29, Santhosh Thottingal santhosh.thottin...@gmail.com wrote: I think you have some confusion on Unicode and Fonts. Let me try to clarify in simple words. Yes - I did! And thank you for such a detailed response. To see if I have understood this - there are three components: 1. Input (Different types of keyboard layouts are used but are independent of the method of encoding - correct?) 2. Encoding and storing the input (ASCII is the older method - have heard of ISCII as well but do not know what that is but Unicode is the standard. 3. Representing, visually for the human user, what has been inputed and encoded. (Font or type faces and these are, to an extent, independent of the encoding method used.) There are Four Components 1. Input Methods ( GOI approved Inscript layout, Various Popular Layouts , Translitraton Keyboards, Phonetic Keyboards) 2. Encoding ( unicode) 3. Font (Opentype Fonts ie. supporting Unicode) 4. Rendering Engines (this does the shaping of Complex Glyphs using the Open type font table in Fonts . eg. Pango in Gnome, Harfbuzz in KDE, ICU in Openoffice java based programmes , Uniscribe in Windows etc ) But I know that many people still use the term data in unicode fonts, data in xyz font etc. This usage came into existence just because, before unicode was popular, most of the Indian publishers used a non-standard way of representing our data- using English(or latin -ascii) data and change the font's 'face' to Indian glyph. a fancy dress hack. The letter k will be shown as hindi ka with the help of a font. ie the data is still english, but what you see is Hindi. So if I understand correctly, not only is the encoding in ASCII but the representation of that encoding is tied to a particular font (that was used for representation at entry?) and will only be represented properly when using that font? However, what I am trying to understand is whether there is consistency across the ASCII encoding? Will ka in Hindi be encoded in ASCII only one way or is there a linkage, that I do not understand, to the font used to represent it as well? ASCII is not like Unicode. It only understands latin, not any other language. All over India, legacy, non-standard local language technologies (ugly hacks) have gained deep roots. Local newspaper websites as well as publishing houses seem to use their own non-standard fonts. This means that documents and web sites get tied to fonts. These fonts may or may not be freely available, and in some extreme cases, may be no longer available at all. If you lose the font, you lose the content as well. Ka in Hindi may be mapped in the position of A in some font , in the position of H in some other font as per the convenience of font developer The reason I ask is because if ka in Hindi is always encoded the same way irrespective of the font used to represent it, then it should not be hard to build an ASCII to Unicode map of encoding that will only have to be done once for each language? Though something tells me I am way off on this assumption. It is Font dependent. There is a need of Preparing Conversion maps for each Ascii font to convert data encoded in them to unicode. Swathanthra Malayalam Computing's Payyan's (http://wiki.smc.org.in/Payyans ) is a tool developed for converting ASCII to Unicode easily for any Indic Language by building a Font map for each needed font . This tool helped Malayalam Wiktionary to convert many copyright expired books in non standard encodings to Unicode Popular Firefox extension named Padma uses similar encoding conversion tables to display ASCII news websites in Unicode This is true. Fonts exist for all scripts , but the variety , or quality of the existing fonts varies. Availability of fonts licensed in foss compatible license is also a problem. For a detailed list of Indic fonts with license info, see http://indlinux.org/wiki/index.php/IndicFontsList Thanks, Santosh. This is a really useful. Also, are these screen or print ready fonts? Each Language Communities can answer this question well. In Malayalam we have both screen and print fonts, including one Ornamental font . You are correct. I would say fonts licensed under any FOSS license instead of free use/reuse. Indeed. FOSS license is what I should have said. In fact, the funds were spent(read wasted) for the development of Proprietary fonts by government agencies like CDAC. Fonts with free(dom) licenses were developed and maintained by FOSS developer communities. *sigh* In your opinion, would they be any real benefit if they did license the ILDC series under a true FOSS license? I dont think this will happen. There is a long history of lobbying for thiswith CDAC from 2001 Onwards and nothing happened. CDAC made enough money by selling ASCII fonts(and still makes) and They cant even think about giving them away with a FOSS
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 17 February 2011 11:29, Gautam John gau...@prathambooks.org wrote: I'm trying to bring together some ideas as to why Unicode is important, what the upsides and downsides are. My initial thoughts: A few other points that I read here: http://anandabazar-unicode.appspot.com/ Data usage: Use of Unicode will significantly reduce bandwidth/storage Search (within a page/web search etc.) Thank you. Best, Gautam http://social.prathambooks.org/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] (OT) On the importance of Unicode
On 17 February 2011 12:35, BalaSundaraRaman sundarbe...@yahoo.com wrote: I have some points to share, but got to go back to work now. Can I get back on this later? Sure, Sundar! No hurry. Thank you. Best, Gautam http://social.prathambooks.org/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l