[Wikimediaindia-l] Need help for Wiki Loves Monuments 2013 in India
Hello guys, greeting. Wiki Loves Monuments contest is not more than two weeks away, and the Indian core team needs more help. I would like to thank Pradeep, Konarak, Arnav and Srikanth for volunteering on this and Wikimedia India for supporting this. The team needs more help in regards to the following: * Designing flyers * Helping us with the website: www.wikilovesmonuments.in * Spreading the word through media and blogs * Organising photowalks * Find some sponsors and partners Well, if you can help us with things not mentioned here, feel free to ping me or someone from the Wiki Loves Monuments team. Regards, Karthik Nadar. Secretary, Wikimedia India Chapter. http://wiki.wikimedia.in ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] Fwd: Wikimedia India Board(EC) Elections-2013 - Results
Congratulations to Moksh and Jayantha, and kudos to Sudhanwa Jogalekar, Anirudh Bhati and Bala Jeyaraman for their priceless contribution. Arun, yes, we are planning for another meeting soon in Mumbai, where we shall be present while others will be present in call, so as to define each others role as the earliest. Regards, Karthik Nadar. Secretary, Wikimedia India Chapter. http://wiki.wikimedia.in On Sun, Aug 18, 2013 at 3:32 PM, Arun Ramarathnam arunra...@gmail.comwrote: Firstly. Kudos to Sudhanwa Jogalekar, Anirudh Bhati and Bala Jeyaraman for their voluntary time, contributions and effort as EC members. I was hoping to see Bala stay on in the EC Congratulations to Jayanta and Moksh for getting elected to the EC of the Wikimedia India!!! Great to see a new EC member from the East. Great effort and stewardship by Arjuna, Tinu and Radhakrishna for ensuring that the elections were conducted professionally. Hats off to you folks. Best wishes to the new team for upcoming activities and your leadership in steering these activities. When do we get to know who will be playing the role of the new office bearers? Any planned timeline for that? regards Arun On Thu, Aug 15, 2013 at 9:31 PM, Arjuna Rao Chavala arjunar...@gmail.comwrote: -- Forwarded message -- From: Arjuna Rao Chavala arjunar...@gmail.com Date: 2013/8/15 Subject: Wikimedia India Board(EC) Elections-2013 - Results To: wmin-members wmin-memb...@googlegroups.com, Wikimedia India EC wikimedia-in-e...@lists.wikimedia.org Cc: masked Hi, We completed the election process for Wikimedia India Chapter Executive Committee (EC) Elections for year 2013 in today's Annual General Body Meeting. We are happy to announce the results of the Election. Total eligible Voters: 66 (Excluding 3 members of the Election Committee, who cant' vote as per the Election rules) Votes polled: 27 (26 by post and 1 in person), (40.9%) Valid votes: 26 Invalid vote:1 Candidate- Votes polled Jayanta Nath - 17 Moksh Juneja - 17 Nikhil Kawale - 3 Santosh Shingare - 6 Yohann Thomas - 7 Hence, Jayanta Nath and Moksh Juneja are declared as elected to EC for the two vacant positions for term 2013-2015. We congratulate Jayanta and Moksh. We thank EC, all the candidates and members for active participation in the Election process. We thank the outgoing Executive Committee led by Sudhanwa for all their contributions. We wish the new EC all the best for growing the movement and the chapter in future. Sincerely, Arjuna Rao Chavala, Tinu Cherian, Radhakrishna A Election Committee for EC Elections-2013 Wikimedia India Chapter ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
[Wikimediaindia-l] Fwd: [Wikimedia-l] Looking for Wikipedians to add already translated articles
Forwarding to India list to cover Indic languages. See below. --Kul -- Forwarded message -- From: James Heilman jmh...@gmail.com Date: Tue, Aug 20, 2013 at 4:12 PM Subject: [Wikimedia-l] Looking for Wikipedians to add already translated articles To: wikimedi...@lists.wikimedia.org, Kul Wadhwa ktwad...@gmail.com We at WikiProject Medicine are working on a collaborative effort with Translators Without Borders (TWB), a group which includes 2,000 or so volunteer translators. We are working to translate key medical articles into as many other languages as possible. Currently we have translated content into 50 or so languages amounting to 2.3 million words of text. The process involves first bringing articles to either GA or FA status in English. They are then delivered, with MediaWiki markup in place, to the TWB website where the text is sent out to the translators. Once translated we at Wikipedia are notified via orange links on this page here: http://en.wikipedia.org/wiki/Wikipedia:MED/Progress This issue currently is that we are missing Wikipedians in some languages to add / combine the translated content into the respective Wikipedia. Some of the article created through this process have reached feature article status including translations into Hungarian of anaphylaxis and hypertension. We currently have translated content in the following languages waiting to be integrated: Hindi Chinese Persian Tagalog Indonesian Macedonian Greek Bulgarian Danish Polish Swedish Arabic Ukrainian Dutch Czech Serbian Slovenian Spanish Telugu Tamil Punjabi Turkish Kurdish Thai Swahili Yoruba Kinyarwanda An overview of the efforts can be found here: http://en.wikipedia.org/wiki/Wikipedia:TTF If you are interested in getting involved in adding translated articles instructions are here: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Translation_task_force/Adding_content If you have further question or comments I would welcome the feedback. James Heilman MD, CCFP(EM), Wikipedian WikiProject Medicine The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list wikimedi...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe -- Kul Wadhwa Head of Mobile Wikimedia Foundation ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] Fwd: Wikimedia India Board(EC) Elections-2013 - Results
Hi, 2013/8/20 Karthik Nadar karthik...@wikimedia.in Congratulations to Moksh and Jayantha, and kudos to Sudhanwa Jogalekar, Anirudh Bhati and Bala Jeyaraman for their priceless contribution. Arun, yes, we are planning for another meeting soon in Mumbai, where we shall be present while others will be present in call, so as to define each others role as the earliest. The first meeting of the new EC (along with outgoing EC) is crucial to be done face to face for the following reasons a) Reflection on the year gone by and capturing lessons learnt b) Team building workshop to learn the approaches and strengths, expectations of one another b) Election of office bearers and other roles of the EC c) Strategy discussion and outline of priority programs and champions till the next year d) Finalisation of budget for the current financial year e) Strategy, programs, budget for the next financial year (required for FDC funding proposal due by 1 Oct 2013). The newly elected EC members input will be critical as they will be continuing for the whole of next financial year. Additional information for Election is available in the Elections and Appointment Rules for WIkimedia Chapter Official positions http://wiki.wikimedia.in/File:WMINElectionAndAppointmentRules.pdf Expenses for meeting should not be a constraint in conducting face to face meeting. Face to Face meeting of joint EC was done in the last year in which the outgoing EC participated. Hope the Executive Committee will consider the above and finalise their meeting plans. Cheers Arjuna ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] Fwd: [Wikimedia-l] Looking for Wikipedians to add already translated articles
Noticed that a vast majority of Indian languages are listed. How about we kickstart a translation rally of sorts for not only WP:Medicine, but other articles as well? On Wed, Aug 21, 2013 at 5:51 AM, Kul Wadhwa kwad...@wikimedia.org wrote: Forwarding to India list to cover Indic languages. See below. --Kul -- Forwarded message -- From: James Heilman jmh...@gmail.com Date: Tue, Aug 20, 2013 at 4:12 PM Subject: [Wikimedia-l] Looking for Wikipedians to add already translated articles To: wikimedi...@lists.wikimedia.org, Kul Wadhwa ktwad...@gmail.com We at WikiProject Medicine are working on a collaborative effort with Translators Without Borders (TWB), a group which includes 2,000 or so volunteer translators. We are working to translate key medical articles into as many other languages as possible. Currently we have translated content into 50 or so languages amounting to 2.3 million words of text. The process involves first bringing articles to either GA or FA status in English. They are then delivered, with MediaWiki markup in place, to the TWB website where the text is sent out to the translators. Once translated we at Wikipedia are notified via orange links on this page here: http://en.wikipedia.org/wiki/Wikipedia:MED/Progress This issue currently is that we are missing Wikipedians in some languages to add / combine the translated content into the respective Wikipedia. Some of the article created through this process have reached feature article status including translations into Hungarian of anaphylaxis and hypertension. We currently have translated content in the following languages waiting to be integrated: Hindi Chinese Persian Tagalog Indonesian Macedonian Greek Bulgarian Danish Polish Swedish Arabic Ukrainian Dutch Czech Serbian Slovenian Spanish Telugu Tamil Punjabi Turkish Kurdish Thai Swahili Yoruba Kinyarwanda An overview of the efforts can be found here: http://en.wikipedia.org/wiki/Wikipedia:TTF If you are interested in getting involved in adding translated articles instructions are here: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Translation_task_force/Adding_content If you have further question or comments I would welcome the feedback. James Heilman MD, CCFP(EM), Wikipedian WikiProject Medicine The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list wikimedi...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe -- Kul Wadhwa Head of Mobile Wikimedia Foundation ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Srikanth Ramakrishnan Treasurer, Wikimedia Chapter [India] Donate to the Wikimedia India Chapter todayhttp://wiki.wikimedia.in/Donations ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] Fwd: [Wikimedia-l] Looking for Wikipedians to add already translated articles
Ideally it would be awesome to use translatewiki.net to host localization for each language for TWB. -Alolita On Tue, Aug 20, 2013 at 9:15 PM, Srikanth Ramakrishnan srik.r...@wikimedia.in wrote: Noticed that a vast majority of Indian languages are listed. How about we kickstart a translation rally of sorts for not only WP:Medicine, but other articles as well? On Wed, Aug 21, 2013 at 5:51 AM, Kul Wadhwa kwad...@wikimedia.org wrote: Forwarding to India list to cover Indic languages. See below. --Kul -- Forwarded message -- From: James Heilman jmh...@gmail.com Date: Tue, Aug 20, 2013 at 4:12 PM Subject: [Wikimedia-l] Looking for Wikipedians to add already translated articles To: wikimedi...@lists.wikimedia.org, Kul Wadhwa ktwad...@gmail.com We at WikiProject Medicine are working on a collaborative effort with Translators Without Borders (TWB), a group which includes 2,000 or so volunteer translators. We are working to translate key medical articles into as many other languages as possible. Currently we have translated content into 50 or so languages amounting to 2.3 million words of text. The process involves first bringing articles to either GA or FA status in English. They are then delivered, with MediaWiki markup in place, to the TWB website where the text is sent out to the translators. Once translated we at Wikipedia are notified via orange links on this page here: http://en.wikipedia.org/wiki/Wikipedia:MED/Progress This issue currently is that we are missing Wikipedians in some languages to add / combine the translated content into the respective Wikipedia. Some of the article created through this process have reached feature article status including translations into Hungarian of anaphylaxis and hypertension. We currently have translated content in the following languages waiting to be integrated: Hindi Chinese Persian Tagalog Indonesian Macedonian Greek Bulgarian Danish Polish Swedish Arabic Ukrainian Dutch Czech Serbian Slovenian Spanish Telugu Tamil Punjabi Turkish Kurdish Thai Swahili Yoruba Kinyarwanda An overview of the efforts can be found here: http://en.wikipedia.org/wiki/Wikipedia:TTF If you are interested in getting involved in adding translated articles instructions are here: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Translation_task_force/Adding_content If you have further question or comments I would welcome the feedback. James Heilman MD, CCFP(EM), Wikipedian WikiProject Medicine The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list wikimedi...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe -- Kul Wadhwa Head of Mobile Wikimedia Foundation ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Srikanth Ramakrishnan Treasurer, Wikimedia Chapter [India] Donate to the Wikimedia India Chapter todayhttp://wiki.wikimedia.in/Donations ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
[Wikimediaindia-l] Translations! Join us at the Wikimedia Asia Cultural Exchange
Hi all, a couple of like minded Asian chapters came up with an idea at Wikimania to coordinate translation efforts of various articles relating to Asia. The premise of this was to ensure that articles about one country or it's culture, which is available in one language, can be made available in other Asian languages. Now, since we in India have a lot of languages and different cultures, each with it's unique set of articles, I see this as a massive opportunity to have articles translated. Have a look; and add your name/home wiki, if participating: http://meta.wikimedia.org/wiki/Wikimedia_Asia_Project/Cultural_Content_Exchange -- Srikanth Ramakrishnan Treasurer, Wikimedia Chapter [India] Donate to the Wikimedia India Chapter todayhttp://wiki.wikimedia.in/Donations ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] Fwd: [Wikimedia-l] Looking for Wikipedians to add already translated articles
Hi Alolita, I think the objective here is to translate the article itself, which I think doesn't require TranslateWiki. From my experience, I figured Translatewiki is primarily to translate code/interfaces/messages, and not articles. Correct me if I'm wrong. On Wed, Aug 21, 2013 at 9:52 AM, Alolita Sharma alolita.sha...@gmail.comwrote: Ideally it would be awesome to use translatewiki.net to host localization for each language for TWB. -Alolita On Tue, Aug 20, 2013 at 9:15 PM, Srikanth Ramakrishnan srik.r...@wikimedia.in wrote: Noticed that a vast majority of Indian languages are listed. How about we kickstart a translation rally of sorts for not only WP:Medicine, but other articles as well? On Wed, Aug 21, 2013 at 5:51 AM, Kul Wadhwa kwad...@wikimedia.orgwrote: Forwarding to India list to cover Indic languages. See below. --Kul -- Forwarded message -- From: James Heilman jmh...@gmail.com Date: Tue, Aug 20, 2013 at 4:12 PM Subject: [Wikimedia-l] Looking for Wikipedians to add already translated articles To: wikimedi...@lists.wikimedia.org, Kul Wadhwa ktwad...@gmail.com We at WikiProject Medicine are working on a collaborative effort with Translators Without Borders (TWB), a group which includes 2,000 or so volunteer translators. We are working to translate key medical articles into as many other languages as possible. Currently we have translated content into 50 or so languages amounting to 2.3 million words of text. The process involves first bringing articles to either GA or FA status in English. They are then delivered, with MediaWiki markup in place, to the TWB website where the text is sent out to the translators. Once translated we at Wikipedia are notified via orange links on this page here: http://en.wikipedia.org/wiki/Wikipedia:MED/Progress This issue currently is that we are missing Wikipedians in some languages to add / combine the translated content into the respective Wikipedia. Some of the article created through this process have reached feature article status including translations into Hungarian of anaphylaxis and hypertension. We currently have translated content in the following languages waiting to be integrated: Hindi Chinese Persian Tagalog Indonesian Macedonian Greek Bulgarian Danish Polish Swedish Arabic Ukrainian Dutch Czech Serbian Slovenian Spanish Telugu Tamil Punjabi Turkish Kurdish Thai Swahili Yoruba Kinyarwanda An overview of the efforts can be found here: http://en.wikipedia.org/wiki/Wikipedia:TTF If you are interested in getting involved in adding translated articles instructions are here: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Translation_task_force/Adding_content If you have further question or comments I would welcome the feedback. James Heilman MD, CCFP(EM), Wikipedian WikiProject Medicine The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list wikimedi...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe -- Kul Wadhwa Head of Mobile Wikimedia Foundation ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Srikanth Ramakrishnan Treasurer, Wikimedia Chapter [India] Donate to the Wikimedia India Chapter todayhttp://wiki.wikimedia.in/Donations ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Srikanth Ramakrishnan Treasurer, Wikimedia Chapter [India] Donate to the Wikimedia India Chapter todayhttp://wiki.wikimedia.in/Donations ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] Fwd: Wikimedia India Board(EC) Elections-2013 - Results
Congrats Moksh Jayanta. On Fri, Aug 16, 2013 at 9:44 AM, Bishakha Datta bishakhada...@gmail.comwrote: Congratulations, Moksh and Jayanta. Nice to see chapter board members from different parts of the country. Best Bishakha On Fri, Aug 16, 2013 at 9:41 AM, Omshivaprakash | Wikimedia omshivaprak...@wikimedia.in wrote: Congratulations Moksh Jayanta. On Fri, Aug 16, 2013 at 6:09 AM, Ashwin Baindur ashwin.bain...@gmail.com wrote: Heartiest congrats to Moksh Jayanta. Thank you Arjuna, RK Tinu for your hard work. :) Ashwin, ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Omshivaprakash H L | ಓಂಶಿವಪ್ರಕಾಶ್ ಎಚ್.ಎಲ್ | ॐ शिवप्रकाश् एच्. एल् Kannada SIG Wikimedia India Chapter [image: File:Wikimedia India logo.svg] http://wiki.wikimedia.in/ ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
[Wikimediaindia-l] Fwd: Indic print material digitization workshop query
Hi Everyone, In my opinion, it is always better to OCR the documents. I agree that it's error prone but there is a Google Summer of Code project being done by AnkurIndia whose aim is to improve the quality of OCRs for Indian scripts. https://www.google-melange.com/gsoc/project/google/gsoc2013/knoxxs/5001 So, maybe not immediately but in short time, OCR is worth it. I am not aware if any Wikisource in Indian languages is as vast as French, English or Italian Wikisource. But we should have it because we have quite a lot of text. Thank You, Aarti On Mon, Aug 19, 2013 at 10:28 PM, Ashwin Baindur ashwin.bain...@gmail.comwrote: Whether to OCR or not to OCR is a significant issue! When we OCR a page of text, the resultant is often error-prone, lost formatting, and the correction requires crowd-sourced correction. Many of us know about Project Gutenberg. The site provides plain vanilla etexts. But what most people do not know that one of the very first crowd-sourcing initiatives - Distributed Proof-readers provides a huge volunteer community correcting OCR pages of text submitted to Project Gutenberg. In fact, I was a Distributed Proofreader before coming to Wikipedia and that was my first crowd-sourced experience. http://www.pgdp.net/c/ I've also done digitisation in a government archive for five years. We took a conscious decision to OCR the text and allow the uncorrected layer to exist rather than take the pains to correct it. The material was used so infrequently, it made good sense for the end-user to proof-read himself should he desire to do so. So the real challenge in digitisation is not OCR, or rather, not just OCR but the creation of an error-free proof-read text layer behind the pdf/other formatted archive document. Ashwin Baindur On Mon, Aug 19, 2013 at 10:12 PM, Sumana Harihareswara suma...@wikimedia.org wrote: On 08/19/2013 02:52 AM, L. Shyamal wrote: Re-posting a now outdated query from meta http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalore/Digitization_workshop_18August2013 now that the workshop has already been conducted I think those that have attended the workshop could comment if this cover Indic language OCR-ing - if it did it would be worthwhile if the OCR software used can be documented on the meta pages or elsewhere such as Wikisource. Most of the more experienced editors here will be fairly familiar with the use of scanners for creating PDF documents and uploading them to places like the Internet Archive but the experience or knowledge of OCRs and their success rates is a bit wanting for Indic languages (fonts). best wishes Shyamal en:User:Shyamal I looked at the talk page on Meta - thank you, Shyamal! For those who do not know: OCR means Optical Character Recognition. When we want to get archival documents onto the web, it's nice to have photos of them, but it's even better to OCR them so that people can clearly read, copy, excerpt, translate, and remix the text. Is there a central list of the problems that OCR software (especially open source OCR software) has with text written in Indic languages? If so, I could help encourage people to fix those problems, as volunteers, via a Google Summer of Code/Outreach Program for Women internship, via a grant-funded project (such as https://meta.wikimedia.org/wiki/Grants:IEG ), or via some other method. People who would like to make Wikisource more easily useful for Indic languages might want to contribute to the Wikisource vision development project that's going on right now: https://wikisource.org/wiki/Wikisource_vision_development The ProofreadPage extension (part of the Wikisource technology stack) is being worked on right now in Aarti K. Dwivedi's Google Summer of Code internship. http://aartindi.blogspot.in/ She might be interested in knowing about these issues, so I am cc'ing her. Also - just because people on this list might be interested! - if you have an old historical map that you'd like to vectorize to get it onto OpenStreetMap, try out the new Map polygon and feature extractor tool: https://github.com/NYPL/map-vectorizer -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Warm regards, Ashwin Baindur -- ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Aarti K. Dwivedi -- Aarti K. Dwivedi ___ Wikimediaindia-l mailing
Re: [Wikimediaindia-l] Indic print material digitization workshop query
Hi Everyone, In my opinion, it is always better to OCR the documents. I agree that it's error prone but there is a Google Summer of Code project being done by AnkurIndia whose aim is to improve the quality of OCRs for Indian scripts. https://www.google-melange.com/gsoc/project/google/gsoc2013/knoxxs/5001 So, maybe not immediately but in short time, OCR is worth it. I am not aware if any Wikisource in Indian languages is as vast as French, English or Italian Wikisource. But we should have it because we have quite a lot of text. Thank You, Aarti On Mon, Aug 19, 2013 at 10:28 PM, Ashwin Baindur ashwin.bain...@gmail.comwrote: Whether to OCR or not to OCR is a significant issue! When we OCR a page of text, the resultant is often error-prone, lost formatting, and the correction requires crowd-sourced correction. Many of us know about Project Gutenberg. The site provides plain vanilla etexts. But what most people do not know that one of the very first crowd-sourcing initiatives - Distributed Proof-readers provides a huge volunteer community correcting OCR pages of text submitted to Project Gutenberg. In fact, I was a Distributed Proofreader before coming to Wikipedia and that was my first crowd-sourced experience. http://www.pgdp.net/c/ I've also done digitisation in a government archive for five years. We took a conscious decision to OCR the text and allow the uncorrected layer to exist rather than take the pains to correct it. The material was used so infrequently, it made good sense for the end-user to proof-read himself should he desire to do so. So the real challenge in digitisation is not OCR, or rather, not just OCR but the creation of an error-free proof-read text layer behind the pdf/other formatted archive document. Ashwin Baindur On Mon, Aug 19, 2013 at 10:12 PM, Sumana Harihareswara suma...@wikimedia.org wrote: On 08/19/2013 02:52 AM, L. Shyamal wrote: Re-posting a now outdated query from meta http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalore/Digitization_workshop_18August2013 now that the workshop has already been conducted I think those that have attended the workshop could comment if this cover Indic language OCR-ing - if it did it would be worthwhile if the OCR software used can be documented on the meta pages or elsewhere such as Wikisource. Most of the more experienced editors here will be fairly familiar with the use of scanners for creating PDF documents and uploading them to places like the Internet Archive but the experience or knowledge of OCRs and their success rates is a bit wanting for Indic languages (fonts). best wishes Shyamal en:User:Shyamal I looked at the talk page on Meta - thank you, Shyamal! For those who do not know: OCR means Optical Character Recognition. When we want to get archival documents onto the web, it's nice to have photos of them, but it's even better to OCR them so that people can clearly read, copy, excerpt, translate, and remix the text. Is there a central list of the problems that OCR software (especially open source OCR software) has with text written in Indic languages? If so, I could help encourage people to fix those problems, as volunteers, via a Google Summer of Code/Outreach Program for Women internship, via a grant-funded project (such as https://meta.wikimedia.org/wiki/Grants:IEG ), or via some other method. People who would like to make Wikisource more easily useful for Indic languages might want to contribute to the Wikisource vision development project that's going on right now: https://wikisource.org/wiki/Wikisource_vision_development The ProofreadPage extension (part of the Wikisource technology stack) is being worked on right now in Aarti K. Dwivedi's Google Summer of Code internship. http://aartindi.blogspot.in/ She might be interested in knowing about these issues, so I am cc'ing her. Also - just because people on this list might be interested! - if you have an old historical map that you'd like to vectorize to get it onto OpenStreetMap, try out the new Map polygon and feature extractor tool: https://github.com/NYPL/map-vectorizer -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Warm regards, Ashwin Baindur -- ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Aarti K. Dwivedi ___ Wikimediaindia-l mailing list
Re: [Wikimediaindia-l] Indic print material digitization workshop query
Colleagues working in Bangla say that in their experience it is faster, cheaper, and less error-prone to create digital texts by typing them in. Once there is a larger body of digitised texts, and OCR technology for Indian languages also improves, OCR could become the preferred option. Tejaswini On 19 August 2013 22:38, Aarti K. Dwivedi ellydwivedi2...@gmail.com wrote: Hi Everyone, In my opinion, it is always better to OCR the documents. I agree that it's error prone but there is a Google Summer of Code project being done by AnkurIndia whose aim is to improve the quality of OCRs for Indian scripts. https://www.google-melange.com/gsoc/project/google/gsoc2013/knoxxs/5001 So, maybe not immediately but in short time, OCR is worth it. I am not aware if any Wikisource in Indian languages is as vast as French, English or Italian Wikisource. But we should have it because we have quite a lot of text. Thank You, Aarti On Mon, Aug 19, 2013 at 10:28 PM, Ashwin Baindur ashwin.bain...@gmail.com wrote: Whether to OCR or not to OCR is a significant issue! When we OCR a page of text, the resultant is often error-prone, lost formatting, and the correction requires crowd-sourced correction. Many of us know about Project Gutenberg. The site provides plain vanilla etexts. But what most people do not know that one of the very first crowd-sourcing initiatives - Distributed Proof-readers provides a huge volunteer community correcting OCR pages of text submitted to Project Gutenberg. In fact, I was a Distributed Proofreader before coming to Wikipedia and that was my first crowd-sourced experience. http://www.pgdp.net/c/ I've also done digitisation in a government archive for five years. We took a conscious decision to OCR the text and allow the uncorrected layer to exist rather than take the pains to correct it. The material was used so infrequently, it made good sense for the end-user to proof-read himself should he desire to do so. So the real challenge in digitisation is not OCR, or rather, not just OCR but the creation of an error-free proof-read text layer behind the pdf/other formatted archive document. Ashwin Baindur On Mon, Aug 19, 2013 at 10:12 PM, Sumana Harihareswara suma...@wikimedia.org wrote: On 08/19/2013 02:52 AM, L. Shyamal wrote: Re-posting a now outdated query from meta http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalore/Digitization_workshop_18August2013 now that the workshop has already been conducted I think those that have attended the workshop could comment if this cover Indic language OCR-ing - if it did it would be worthwhile if the OCR software used can be documented on the meta pages or elsewhere such as Wikisource. Most of the more experienced editors here will be fairly familiar with the use of scanners for creating PDF documents and uploading them to places like the Internet Archive but the experience or knowledge of OCRs and their success rates is a bit wanting for Indic languages (fonts). best wishes Shyamal en:User:Shyamal I looked at the talk page on Meta - thank you, Shyamal! For those who do not know: OCR means Optical Character Recognition. When we want to get archival documents onto the web, it's nice to have photos of them, but it's even better to OCR them so that people can clearly read, copy, excerpt, translate, and remix the text. Is there a central list of the problems that OCR software (especially open source OCR software) has with text written in Indic languages? If so, I could help encourage people to fix those problems, as volunteers, via a Google Summer of Code/Outreach Program for Women internship, via a grant-funded project (such as https://meta.wikimedia.org/wiki/Grants:IEG ), or via some other method. People who would like to make Wikisource more easily useful for Indic languages might want to contribute to the Wikisource vision development project that's going on right now: https://wikisource.org/wiki/Wikisource_vision_development The ProofreadPage extension (part of the Wikisource technology stack) is being worked on right now in Aarti K. Dwivedi's Google Summer of Code internship. http://aartindi.blogspot.in/ She might be interested in knowing about these issues, so I am cc'ing her. Also - just because people on this list might be interested! - if you have an old historical map that you'd like to vectorize to get it onto OpenStreetMap, try out the new Map polygon and feature extractor tool: https://github.com/NYPL/map-vectorizer -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Warm regards, Ashwin Baindur