Re: [tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra
Thank you again. I think I'll stay with plain txt -- pdf looks too difficult to achieve. Now, next problem: Everything worked fine with my 1-page test pdf. I now tried to do the same with a 30 MB 500 pages pdf. After running convert -density 300 test.pdf -depth 8 -strip -background white -alpha off test.tiff it took 2 hours and then suddenly everything went black and I could not do anything. I guess my Mac is too weak to handle this. I guess splitting the pdf in many parts is the only option left? With pdftk I used the command "pdftk test.pdf burst" to split the pdf in single pages. I then put around 50 pages in a new folder and used "pdftk *.pdf cat output test.pdf" to combine them. Is there a faster way to do this? I do not know with which command I could split the 500 automatically in bundles of 50. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/02a36337-bc70-46c0-8844-5e114e77db55%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra
It works! I am so relieved. Thank you all for the help. Still I have a couple of questions since I've read a couple of tutorials, each using other commands: 1. Converting my Fraktur pdf files in tiff I use imagemagick. Is this the right command? convert -density 300 test.pdf -depth 8 -strip -background white -alpha off test.tiff 2. For tesseract then the command: tesseract test.tiff outtest -l deu_frak With this I get a txt version of the tiff. 3. Not that it matters too much (I'm over the moon that it works like this), can I get as an output instead of a txt the original pdf just with a search-and-copy-function? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/433f3fad-e316-49aa-9a93-367ee596a7e6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra
Nothing happens if I type in echo $TESSDATA_PREFIX I thought about installing tesseract 4.0beta, is there a step-by-step-guide how to do this? with brew install tesseract I cannot choose the version, i.e. it's 3.05.01 Am Dienstag, 10. April 2018 15:07:18 UTC+2 schrieb Fanatico: > > You installed it using brew or compiled it yourself? > > try to type this in the terminal and post here the result > > echo $TESSDATA_PREFIX > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/93afc21e-5e17-469b-a5b4-52378c9ed926%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra
Thank you for your reply. I used the command following this guide https://www.youtube.com/watch?v=QhJiOCwz-_I -- if it's wrong, then I will not follow this guide anymore. Yes, I have Fraktur.traineddata in usr/loca/share/tessdata I do not know how to change "the TESSDATA_PREFIX environment variable" -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/25f7316b-424f-49f3-b33d-9a00fe5a1eaf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Error opening traineddata files on Mac High Sierra
I downloaded deu_frak.traineddata Fraktur.traineddata and frk.traineddata to usr/loca/share/tessdata. But when using $ tesseract file.tiff -l Fraktur Fraktur I get the error message Error opening data file ./tessdata/Fraktur.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'Fraktur' Tesseract couldn't load any languages! Could not initialize tesseract. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e190c5c4-9099-4077-98a8-bf03802e509d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.