[PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?
This is somewhat [OT] but I wanted to ask - how often do you come across web content that is in UTF-8 and is available under an appropriate license that allows sharing and distribution ? If not enough local language content is available (in the region) in UTF-8, do you have any thoughts on why that would be so ? Especially given the prolific amount of published literature that one sees. Additionally, do you know of any sources whereby children's rhymes in the local language can be available so as to propose an inclusion of them as part of OLPC and like projects ? ~sankarshan -- You see things; and you say 'Why?'; But I dream things that never were; and I say 'Why not?' - George Bernard Shaw -- __ Pune GNU/Linux Users Group Mailing List: (plug-mail@plug.org.in) List Information: http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.
Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?
e.g Rather than entering ramayan in google can I enter रामायण , and still get relevant results ? Sorry, to reply to my own post, but it already works with google. I'm amazed. Try the following, if you are curious http://www.google.co.in/search?q=%E0%A4%B0%E0%A4%BE%E0%A4%AE%E0%A4%BE%E0%A4%AF%E0%A4%A3ie=utf-8oe=utf-8aq=trls=org.mozilla:en-US:official -मंदार -- __ Pune GNU/Linux Users Group Mailing List: (plug-mail@plug.org.in) List Information: http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.
Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?
On 12/12/2008 2:35 PM, Sankarshan Mukhopadhyay wrote: This is somewhat [OT] but I wanted to ask - how often do you come across web content that is in UTF-8 and is available under an appropriate license that allows sharing and distribution ? On somewhat related topic, is there a way to search the internet using non-english (indic to be more specific) language using popular (or less popular will also do) search engines ? e.g Rather than entering ramayan in google can I enter ?? , and still get relevant results ? For above to work, search engines will have to index UTF-8 data, and user will have to query using UTF-8 What other problems does one see ? --- Mandar D Vaze http://mandarvaze.wordpress.com http://twitter.com/mandarvaze http://www.linkedin.com/in/mandarvaze -- __ Pune GNU/Linux Users Group Mailing List: (plug-mail@plug.org.in) List Information: http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.
Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?
On Fri, Dec 12, 2008 at 4:06 PM, Devendra Laulkar devendralaul...@yahoo.com wrote: Does your question imply that why don't we have something like project gutenberg for local languages ? Anyways, I always wondered about this question. There has been some effort by IIT's and other institutions, but it seemed to be scanned images of text, instead of the actual literature in UTF-8. In a way yes, you did put it across much nicely than I could. I don't see enough content that is shared via a Project Gutenberg or like system. And, I don't come across enough content in UTF-8. There are umpteen magazines and thus content must be created regularly, what I'd really like to know is what stops the authors from pushing them out in UTF-8. This is somewhat in the context of the fact that the Indic content creation experience is significantly improved since the early days of IndLinux. I am not really interested in the scanned images. They are good efforts, but don't end up addressing the issue. -- You see things; and you say 'Why?'; But I dream things that never were; and I say 'Why not?' - George Bernard Shaw -- __ Pune GNU/Linux Users Group Mailing List: (plug-mail@plug.org.in) List Information: http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.
Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?
Hi, This is somewhat [OT] but I wanted to ask - how often do you come across web content that is in UTF-8 and is available under an appropriate license that allows sharing and distribution ? If not enough local language content is available (in the region) in UTF-8, do you have any thoughts on why that would be so ? Especially given the prolific amount of published literature that one sees. Does your question imply that why don't we have something like project gutenberg for local languages ? Anyways, I always wondered about this question. There has been some effort by IIT's and other institutions, but it seemed to be scanned images of text, instead of the actual literature in UTF-8. -Devendra. -- __ Pune GNU/Linux Users Group Mailing List: (plug-mail@plug.org.in) List Information: http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.
Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?
On 12/12/2008 4:14 PM, Sankarshan Mukhopadhyay wrote: system. And, I don't come across enough content in UTF-8. There are umpteen magazines and thus content must be created regularly, what I'd I saw some entries on Wikipedia. See my other post about searching using indic string. I am not really interested in the scanned images. They are good efforts, but don't end up addressing the issue. Scanned images do not help in search, so less useful. --- Mandar D Vaze http://mandarvaze.wordpress.com http://twitter.com/mandarvaze http://www.linkedin.com/in/mandarvaze -- __ Pune GNU/Linux Users Group Mailing List: (plug-mail@plug.org.in) List Information: http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.
Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?
Mandar Vaze wrote: e.g Rather than entering ramayan in google can I enter ?? , and still get relevant results ? If there is content that matches your search pattern/string, yes you will. I do that on and off for various Indic languages. http://santhoshtr.livejournal.com/15068.html is a recent blog for example. ~s -- __ Pune GNU/Linux Users Group Mailing List: (plug-mail@plug.org.in) List Information: http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.