Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-21 Thread sankarshan
On Wed, Aug 21, 2013 at 11:23 AM, Tejaswini Niranjana t...@cscs.res.in wrote: Colleagues working in Bangla say that in their experience it is faster, cheaper, and less error-prone to create digital texts by typing them in. The cheaper is an interesting word to use in this context. Are we still

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-21 Thread Pavanaja U B
India Community list Subject: Re: [Wikimediaindia-l] Indic print material digitization workshop query Colleagues working in Bangla say that in their experience it is faster, cheaper, and less error-prone to create digital texts by typing them in. Once there is a larger body of digitised texts

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-21 Thread Dhaval S. Vyas
*To:* Wikimedia India Community list *Subject:* Re: [Wikimediaindia-l] Indic print material digitization workshop query ** ** Colleagues working in Bangla say that in their experience it is faster, cheaper, and less error-prone to create digital texts by typing them in. Once there is a larger

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-21 Thread Vishnu T
[mailto: wikimediaindia-l-boun...@lists.wikimedia.org] *On Behalf Of *Tejaswini Niranjana *Sent:* 21 August 2013 11:24 *To:* Wikimedia India Community list *Subject:* Re: [Wikimediaindia-l] Indic print material digitization workshop query ** ** Colleagues working in Bangla say

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-21 Thread Jayanta Nath
@Sumana Harihareswara Please look the Bengali OCR https://code.google.com/p/banglaocr/ and its need to developed. On Mon, Aug 19, 2013 at 10:12 PM, Sumana Harihareswara suma...@wikimedia.org wrote: On 08/19/2013 02:52 AM, L. Shyamal wrote: Re-posting a now outdated query from meta

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-20 Thread Aarti K. Dwivedi
Hi Everyone, In my opinion, it is always better to OCR the documents. I agree that it's error prone but there is a Google Summer of Code project being done by AnkurIndia whose aim is to improve the quality of OCRs for Indian scripts.

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-20 Thread Tejaswini Niranjana
Colleagues working in Bangla say that in their experience it is faster, cheaper, and less error-prone to create digital texts by typing them in. Once there is a larger body of digitised texts, and OCR technology for Indian languages also improves, OCR could become the preferred option. Tejaswini

[Wikimediaindia-l] Indic print material digitization workshop query

2013-08-19 Thread L. Shyamal
Re-posting a now outdated query from meta http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalore/Digitization_workshop_18August2013 now that the workshop has already been conducted I think those that have attended the workshop could comment if this cover Indic language

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-19 Thread sankarshan
On Mon, Aug 19, 2013 at 12:22 PM, L. Shyamal lshya...@gmail.com wrote: Re-posting a now outdated query from meta http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalore/Digitization_workshop_18August2013 The phrase creating text based documents which forms the basis of

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-19 Thread L. Shyamal
Thank you Subhashish for the response at: http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalore/Digitization_workshop_18August2013 Dear Shyamal, this workshop was for demonstrating the participants about create a home made set up to scan the books, edit the scanned images

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-19 Thread Sumana Harihareswara
On 08/19/2013 02:52 AM, L. Shyamal wrote: Re-posting a now outdated query from meta http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalore/Digitization_workshop_18August2013 now that the workshop has already been conducted I think those that have attended the workshop

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-19 Thread Ashwin Baindur
Whether to OCR or not to OCR is a significant issue! When we OCR a page of text, the resultant is often error-prone, lost formatting, and the correction requires crowd-sourced correction. Many of us know about Project Gutenberg. The site provides plain vanilla etexts. But what most people do not

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-19 Thread Vishnu T
A brief account about the Workshop as a participant. The workshop was meant as a DIY digitization without having to invest in a scanner and to use a simple digital camera for effective digitization of books and documents. The following were covered during the Workshop by Viswaprabha, who mainly

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-19 Thread വിശ്വപ്രഭ
I am still traveling away in Bangalore etc. seeking out sources and opportunities for old texts and other media for Wikimedia commons from various locations of India. I think different people look at the issue through different layers and perspectives. This might call for a detailed write-up on

Re: [Wikimediaindia-l] Indic print material digitization workshop query

2013-08-19 Thread sankarshan
On Mon, Aug 19, 2013 at 10:12 PM, Sumana Harihareswara suma...@wikimedia.org wrote: Is there a central list of the problems that OCR software (especially open source OCR software) has with text written in Indic languages? If so, I could help encourage people to fix those problems, as