Re: [tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-11 Thread Firlefanz

Thank you again. I think I'll stay with plain txt -- pdf looks too 
difficult to achieve.

Now, next problem: Everything worked fine with my 1-page test pdf. I now 
tried to do the same with a 30 MB 500 pages pdf. After running convert 
-density 300 test.pdf -depth 8 -strip -background white -alpha off 
test.tiff it took 2 hours and then suddenly everything went black and I 
could not do anything. I guess my Mac is too weak to handle this. I guess 
splitting the pdf in many parts is the only option left? 
With pdftk I used the command "pdftk test.pdf burst" to split the pdf in 
single pages. I then put around 50 pages in a new folder and used "pdftk 
*.pdf cat output test.pdf" to combine them. Is there a faster way to do 
this? I do not know with which command I could split the 500 automatically 
in bundles of 50.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/02a36337-bc70-46c0-8844-5e114e77db55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-11 Thread Firlefanz

It works! I am so relieved. Thank you all for the help.

Still I have a couple of questions since I've read a couple of tutorials, 
each using other commands:

1. Converting my Fraktur pdf files in tiff I use imagemagick. Is this the 
right command? convert -density 300 test.pdf -depth 8 -strip -background 
white -alpha off test.tiff

2. For tesseract then the command: tesseract test.tiff outtest -l deu_frak
With this I get a txt version of the tiff. 

3. Not that it matters too much (I'm over the moon that it works like 
this), can I get as an output instead of a txt the original pdf just with a 
search-and-copy-function?


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/433f3fad-e316-49aa-9a93-367ee596a7e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Firlefanz
Nothing happens if I type in echo $TESSDATA_PREFIX

I thought about installing tesseract 4.0beta, is there a step-by-step-guide 
how to do this? with brew install tesseract I cannot choose the version, 
i.e. it's 3.05.01


Am Dienstag, 10. April 2018 15:07:18 UTC+2 schrieb Fanatico:
>
> You installed it using brew or compiled it yourself?
>
> try to type this in the terminal and post here the result
>
> echo $TESSDATA_PREFIX
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/93afc21e-5e17-469b-a5b4-52378c9ed926%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Firlefanz

Thank you for your reply. I used the command following this guide 
https://www.youtube.com/watch?v=QhJiOCwz-_I -- if it's wrong, then I will 
not follow this guide anymore.

Yes, I have Fraktur.traineddata in usr/loca/share/tessdata

I do not know how to change "the TESSDATA_PREFIX environment variable"

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/25f7316b-424f-49f3-b33d-9a00fe5a1eaf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Firlefanz
I downloaded deu_frak.traineddata Fraktur.traineddata and frk.traineddata 
to usr/loca/share/tessdata. But when using

$ tesseract file.tiff -l Fraktur Fraktur

I get the error message

Error opening data file ./tessdata/Fraktur.traineddata 
Please make sure the TESSDATA_PREFIX environment variable is set to the 
parent directory of your "tessdata" directory. Failed loading language 
'Fraktur' Tesseract couldn't load any languages! Could not initialize 
tesseract.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e190c5c4-9099-4077-98a8-bf03802e509d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.