[PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?

2008-12-12 Thread Sankarshan Mukhopadhyay
This is somewhat [OT] but I wanted to ask - how often do you come
across web content that is in UTF-8 and is available under an
appropriate license that allows sharing and distribution ?

If not enough local language content is available (in the region) in
UTF-8, do you have any thoughts on why that would be so ? Especially
given the prolific amount of published literature that one sees.

Additionally, do you know of any sources whereby children's rhymes in
the local language can be available so as to propose an inclusion of
them as part of OLPC and like projects ?

~sankarshan

-- 
You see things; and you say 'Why?';
But I dream things that never were;
and I say 'Why not?' - George Bernard Shaw

--
__
Pune GNU/Linux Users Group Mailing List:  (plug-mail@plug.org.in)
List Information:  http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail
Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.


Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?

2008-12-12 Thread Mandar Vaze
 e.g Rather than entering ramayan in google can I enter रामायण , and still
 get relevant results ?

Sorry, to reply to my own post, but it already works with google. I'm
amazed. Try the following, if you are curious

http://www.google.co.in/search?q=%E0%A4%B0%E0%A4%BE%E0%A4%AE%E0%A4%BE%E0%A4%AF%E0%A4%A3ie=utf-8oe=utf-8aq=trls=org.mozilla:en-US:official

-मंदार
--
__
Pune GNU/Linux Users Group Mailing List:  (plug-mail@plug.org.in)
List Information:  http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail
Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.

Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?

2008-12-12 Thread Mandar Vaze
On 12/12/2008 2:35 PM, Sankarshan Mukhopadhyay wrote:
 This is somewhat [OT] but I wanted to ask - how often do you come
 across web content that is in UTF-8 and is available under an
 appropriate license that allows sharing and distribution ?

On somewhat related topic, is there a way to search the internet using 
non-english (indic to be more specific) language using popular (or less 
popular will also do) search engines ?
e.g Rather than entering ramayan in google can I enter ?? , and 
still get relevant results ?

For above to work, search engines will have to index UTF-8 data, and 
user will have to query using UTF-8

What other problems does one see ?

---

Mandar D Vaze
http://mandarvaze.wordpress.com
http://twitter.com/mandarvaze
http://www.linkedin.com/in/mandarvaze
--
__
Pune GNU/Linux Users Group Mailing List:  (plug-mail@plug.org.in)
List Information:  http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail
Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.


Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?

2008-12-12 Thread Sankarshan Mukhopadhyay
On Fri, Dec 12, 2008 at 4:06 PM, Devendra Laulkar
devendralaul...@yahoo.com wrote:

 Does your question imply that why don't we have something like project 
 gutenberg for local languages ? Anyways, I always wondered about this 
 question. There has been some effort by IIT's and other institutions, but it 
 seemed to be scanned images of text, instead of the actual literature in 
 UTF-8.

In a way yes, you did put it across much nicely than I could. I don't
see enough content that is shared via a Project Gutenberg or like
system. And, I don't come across enough content in UTF-8. There are
umpteen magazines and thus content must be created regularly, what I'd
really like to know is what stops the authors from pushing them out in
UTF-8. This is somewhat in the context of the fact that the Indic
content creation experience is significantly improved since the early
days of IndLinux.

I am not really interested in the scanned images. They are good
efforts, but don't end up addressing the issue.


-- 
You see things; and you say 'Why?';
But I dream things that never were;
and I say 'Why not?' - George Bernard Shaw

--
__
Pune GNU/Linux Users Group Mailing List:  (plug-mail@plug.org.in)
List Information:  http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail
Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.


Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?

2008-12-12 Thread Devendra Laulkar
Hi,
 This is somewhat [OT] but I wanted to ask - how often do you
 come
 across web content that is in UTF-8 and is available under
 an
 appropriate license that allows sharing and distribution ?
 
 If not enough local language content is available (in the
 region) in
 UTF-8, do you have any thoughts on why that would be so ?
 Especially
 given the prolific amount of published literature that one
 sees.

Does your question imply that why don't we have something like project 
gutenberg for local languages ? Anyways, I always wondered about this question. 
There has been some effort by IIT's and other institutions, but it seemed to be 
scanned images of text, instead of the actual literature in UTF-8.

-Devendra.


  

--
__
Pune GNU/Linux Users Group Mailing List:  (plug-mail@plug.org.in)
List Information:  http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail
Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.


Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?

2008-12-12 Thread Mandar Vaze
On 12/12/2008 4:14 PM, Sankarshan Mukhopadhyay wrote:
 system. And, I don't come across enough content in UTF-8. There are
 umpteen magazines and thus content must be created regularly, what I'd

I saw some entries on Wikipedia. See my other post about searching using 
indic string.
 I am not really interested in the scanned images. They are good
 efforts, but don't end up addressing the issue.

Scanned images do not help in search, so less useful.

---

Mandar D Vaze
http://mandarvaze.wordpress.com
http://twitter.com/mandarvaze
http://www.linkedin.com/in/mandarvaze
--
__
Pune GNU/Linux Users Group Mailing List:  (plug-mail@plug.org.in)
List Information:  http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail
Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.


Re: [PLUG] [OT] Content in UTF-8 encoding : how often do you come across it ?

2008-12-12 Thread sankarshan . mukhopadhyay
Mandar Vaze wrote:

 e.g Rather than entering ramayan in google can I enter ?? , and 
 still get relevant results ?

If there is content that matches your search pattern/string, yes you
will. I do that on and off for various Indic languages.
http://santhoshtr.livejournal.com/15068.html is a recent blog for example.

~s

--
__
Pune GNU/Linux Users Group Mailing List:  (plug-mail@plug.org.in)
List Information:  http://plug.org.in/cgi-bin/mailman/listinfo/plug-mail
Send 'help' to plug-mail-requ...@plug.org.in for mailing instructions.