@Sumana Harihareswara Please look the Bengali OCR https://code.google.com/p/banglaocr/ and its need to developed.
On Mon, Aug 19, 2013 at 10:12 PM, Sumana Harihareswara < suma...@wikimedia.org> wrote: > On 08/19/2013 02:52 AM, L. Shyamal wrote: > > Re-posting a now outdated query from meta > > > http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalore/Digitization_workshop_18August2013 > > > > now that the workshop has already been conducted I think those that have > > attended the workshop could comment if this cover Indic language OCR-ing > - > > if it did it would be worthwhile if the OCR software used can be > documented > > on the meta pages or elsewhere such as Wikisource. Most of the more > > experienced editors here will be fairly familiar with the use of scanners > > for creating PDF documents and uploading them to places like the Internet > > Archive but the experience or knowledge of OCRs and their success rates > is > > a bit wanting for Indic languages (fonts). > > > > best wishes > > Shyamal > > en:User:Shyamal > > I looked at the talk page on Meta - thank you, Shyamal! > > For those who do not know: OCR means Optical Character Recognition. > When we want to get archival documents onto the web, it's nice to have > photos of them, but it's even better to OCR them so that people can > clearly read, copy, excerpt, translate, and remix the text. > > Is there a central list of the problems that OCR software (especially > open source OCR software) has with text written in Indic languages? If > so, I could help encourage people to fix those problems, as volunteers, > via a Google Summer of Code/Outreach Program for Women internship, via a > grant-funded project (such as https://meta.wikimedia.org/wiki/Grants:IEG > ), or via some other method. > > People who would like to make Wikisource more easily useful for Indic > languages might want to contribute to the Wikisource vision development > project that's going on right now: > > https://wikisource.org/wiki/Wikisource_vision_development > > The ProofreadPage extension (part of the Wikisource technology stack) is > being worked on right now in Aarti K. Dwivedi's Google Summer of Code > internship. http://aartindi.blogspot.in/ She might be interested in > knowing about these issues, so I am cc'ing her. > > Also - just because people on this list might be interested! - if you > have an old historical map that you'd like to vectorize to get it onto > OpenStreetMap, try out the new "Map polygon and feature extractor" tool: > https://github.com/NYPL/map-vectorizer > > -- > Sumana Harihareswara > Engineering Community Manager > Wikimedia Foundation > > _______________________________________________ > Wikimediaindia-l mailing list > Wikimediaindiafirstname.lastname@example.org > To unsubscribe from the list / change mailing preferences visit > https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l >
_______________________________________________ Wikimediaindia-l mailing list Wikimediaindiaemail@example.com To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l