Yes, we need to get it under a suitable license. If the technical issue related
to OCR is resolved, we can talk to them about releasing the content into public
"That language is an instrument of human reason, and not merely a medium for
the expression of thought, is a truth generally admitted."
- George Boole, quoted in Iverson's Turing Award Lecture
>From: Shiju Alex <shijualexonl...@gmail.com>
>Sent: Mon, November 15, 2010 12:02:42 PM
>Subject: Re: [Wikimediaindia-l] Tamil Encyclopedia merge into Wikipedia.
>I have a query.
>What is the license of Tamil Kalaikalanjiam? Did Tamil Nadu government or
>Virtual University had officially announced that this Encyclopedia is released
>in Public Domain or in some creative commons license so that we can reuse the
>content. If yes, we can very well reuse the content. Otherwise it will be
>copyright violation. So kindly verify this.
>Let us not assume that since it is published by Government it will be in
>domain. In India that is not the case.
>In 2008 December, Kerala Government has officially announced that it is
>changing the license of similar encyclopedic project in Malayalam
>(sarvavijanakosam) to Free documentation license so that Malayalam wiki
>community can reuse its content to develop Malayalam wikipedia. Governmant
>officially announced it. Kerala Government has also set up its own wiki (to
>us) for Sarvavijanakosamand they are slowly digitizing the content and posting
>in its own wiki (http://mal.sarva.gov.in). They have completed some 2,900
>articles now. We are reusing this content to enhance many of the existing
>articles. But we are not copy pasting the entire content due to various
>reasons. The main reason is, the content need to rewritten as per the style
>I really have doubt about the efficiency of current OCR softwares for Indian
>languages. It is still under development. The existing solutions are not
>I am not sure about Tamil OCR softwares.
>On Mon, Nov 15, 2010 at 11:33 AM, Murali Kumar <pthoo...@hotmail.com> wrote:
>Dear Wikimedia India,
>>As you probably aware the Govt. of India, immediately post Independence
>>multiple Indian language encyclopedia projects to stream in Science and
>>Technology. The Tamil language encyclopedia was completed
>>I'm pleased to report Tamil Virtual University has scanned in the Tamil
>>Kalaikalanjiam / Tamil Encyclopedia [Please see Reference 1 below].
>>I was able to download the material via the wonderful wget command and the
>>'convert' (from imagemagick lib) in GNU/Linux. However each of the 10
>>is close to 700 MB without compression.
>>I would imagine, the people behind this mammoth task (pre-internet era) would
>>have liked it to be merged into a Wiki type format, which would make it a
>>living document in-sync with the times.
>>I do not have any experience with 1) Tamil OCR software and 2) Automated
>>Can anyone take the lead on this project ? It will help boost the number of
>>quality, articles in Indian languages. The Children's encyclopedia is being
>>scanned and has a lot of great visual content.
>>I have uploaded a sample (10 MB) PDF file at
>> if you are interested to give it a spin.
>>1. http://www.tamilvu.org/library/libindex.htm and click on Kalaikalanjiam /
>>Wikimediaindia-l mailing list
Wikimediaindia-l mailing list