Re: [INDOLOGY] OCR

2025-06-16 Thread Tyler Neill via INDOLOGY
Dear List members, The new drag-and-drop interface to Google Vision OCR that I mentioned last month is now ready for use on Skrutable. Go straight to the new subpage skrutable.info/ocr , or look for the small link on the main page, lower-left. The FAQs should answer mos

Re: [INDOLOGY] OCR

2025-05-12 Thread Tyler Neill via INDOLOGY
Hi all, Regarding Patrick’s question about easy OCR, I suspect he’s particularly looking for a tool that can handle multi-page PDFs in one go, which could be especially helpful for digitization projects like UTA’s Resource Library for Dharmaśāstra Studies

Re: [INDOLOGY] OCR

2025-05-11 Thread Rolf Heinrich Koch via INDOLOGY
Yes, Patrick, this is possible- including some errors. I give you an example: 1. I loaded up this image to ChatGPT (a verse from AK): 2. ChatGPT produced this transkription (searchable): धूमग्निस्तरणिर्मित्रशत्रुभानुगभस्तिरोचनः । विभावसुसुप्रीहपतिस्तेपवांस्तपतिर्यपातिः ॥ ३० ॥ 3. a translitera

Re: [INDOLOGY] OCR

2025-05-09 Thread Harry Spier via INDOLOGY
Hi Patrick, Have you tried SanskritCR (here are the instructions) https://sri.auroville.org/projects/sanskrit-ocr/ Its just cut and paste an image. But I'm not sure if it only does 1 page conversion at a time. Or is multi page conversion a paid service? Anyone know? Harry Spier On Fri, May 9, 2

Re: [INDOLOGY] OCR with diacritics

2025-04-05 Thread Natālija Burišina via INDOLOGY
Dear Paras, You can try using this tool that supports many languages including Sanskrit, and also has many other features along OCR: https://www.pdf24.org/en/ Ar cieņu Best regards, Natālija Burišina MA ___ INDOLOGY mailing list INDOLOGY@list.indology

Re: [INDOLOGY] OCR with diacritics

2025-03-31 Thread Paras Mehta via INDOLOGY
Dear Natālija, Thanks a lot for your help. I used the PDF OCR tool on the website. It makes the PDF file searchable. Is there any option to also extract the text from the PDF file? Thanks. Best wishes, Paras On Sun, Mar 30, 2025 at 12:53 PM Natālija Burišina via INDOLOGY < indology@list.indolog

Re: [INDOLOGY] OCR with diacritics

2025-03-31 Thread Paras Mehta via INDOLOGY
Dear Sir, Thanks a lot for your reply and the links. This tool seems to make a PDF file searchable. Am I right? Does this tool also extract the text from the PDF file? Best wishes, Paras On Sat, Mar 29, 2025 at 7:49 AM Dominik Wujastyk wrote: > Well, what do you know? > >- https://github.c

Re: [INDOLOGY] OCR with diacritics

2025-03-30 Thread Dominik Wujastyk via INDOLOGY
That was new to me and it's very good! Thanks, Natālija! -- Dominik Wujastyk, On Sun, 30 Mar 2025 at 01:23, Natālija Burišina via INDOLOGY < indology@list.indology.info> wrote: > Dear Paras, > > You can try using this tool that supports many languages including > Sanskrit, and also has many ot

Re: [INDOLOGY] OCR with diacritics

2025-03-28 Thread Dominik Wujastyk via INDOLOGY
Well, what do you know? - https://github.com/Shreeshrii/tesstrain-Sanskrit-IAST ___ INDOLOGY mailing list INDOLOGY@list.indology.info https://list.indology.info/mailman/listinfo/indology

Re: [INDOLOGY] OCR with diacritics

2025-03-28 Thread Dominik Wujastyk via INDOLOGY
It's not the only player, Paras, but some people have good results with - https://ocrmypdf.readthedocs.io/en/latest/ It supports many languages, and with some ingenuity it might be possible to make a profile specifically optimized for Indic transliteration. Maybe someone already has? Best, D

Re: [INDOLOGY] OCR and diacritics

2024-01-08 Thread Michaël Meyer via INDOLOGY
Dear Patricia, Dear all, The most convenient I found so far is OCRmyPDF ( https://ocrmypdf.readthedocs.io). It is a wrapper of Tesseract (Google's OCR engine). Best, Michaël Meyer Le lun. 8 janv. 2024 à 09:33, Patricia SAUTHOFF a écrit : > Dear all, > > In the past I have found Abbyy FineRead

Re: [INDOLOGY] OCR for sanskrit transliteration

2021-12-01 Thread Harry Spier via INDOLOGY
I tried sanskritCR but it was problematic making global replacements especially with nasals since it sometimes converted different characters with diacriticals to the same letter. Sent from mobile phone. On Wed, Dec 1, 2021, 10:47 Timothy Cahill, wrote: > Dear Harry, >The O - Sanskrit CR (p

Re: [INDOLOGY] OCR for sanskrit transliteration

2021-12-01 Thread Timothy Cahill via INDOLOGY
ven > rendering with cedillas) to be useful. My bad luck? > > > > Tim Lubin > > > > *From: *INDOLOGY on behalf of > INDOLOGY > *Reply-To: *Tim Cahill > *Date: *Wednesday, December 1, 2021 at 10:47 AM > *To: *Harry Spier > *Cc: *INDOLOGY > *Subject:

Re: [INDOLOGY] OCR for sanskrit transliteration

2021-12-01 Thread Lubin, Tim
: [INDOLOGY] OCR for sanskrit transliteration Dear Harry, The O - Sanskrit CR (pronounced "Seer") also does a tolerably good job --so you can clean up the output by replacing its umlauts with macrons, its cedillas with dots, etc., as global replacements. (Just as Diego suggested.)

Re: [INDOLOGY] OCR for sanskrit transliteration

2021-12-01 Thread Timothy Cahill via INDOLOGY
Dear Harry, The O - Sanskrit CR (pronounced "Seer") also does a tolerably good job --so you can clean up the output by replacing its umlauts with macrons, its cedillas with dots, etc., as global replacements. (Just as Diego suggested.) BTW, S. Veṅkaṭarāma Śāstri's English translation of the

Re: [INDOLOGY] OCR for sanskrit transliteration

2021-12-01 Thread DIEGO LOUKOTA SANCLEMENTE
Dear Harry, Something I do (not perfect) is taking the digital Devanagari text and input it to Google Translate set to Nepali. The phinetic transcription yields tolerable IAST text with some quirks that can be easily fixed with the "replace all" feature on Microsoft Word vel sim. (e.g. ē -> e)

Re: [INDOLOGY] OCR for sanskrit transliteration

2021-12-01 Thread DIEGO LOUKOTA SANCLEMENTE
"phonetic", not "phinetic"... On Wed, Dec 1, 2021, 7:18 AM DIEGO LOUKOTA SANCLEMENTE < diegolouk...@ucla.edu> wrote: > > Dear Harry, > > Something I do (not perfect) is taking the digital Devanagari text and > input it to Google Translate set to Nepali. The phinetic transcription > yields tol