If you did not installed osd[1] datafile it is a config bug??? [1] https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.01.osd.tar.gz
Zdenko On Fri, Feb 28, 2014 at 5:09 PM, Bernard Polarski <[email protected]>wrote: > Thanks for the tip ! > > I see a file 'pdf' in tessdata/configs with 2 values in it : > > tessedit_create_pdf 1 > tessedit_pageseg_mode 1 > > > Sound like this 'tessedit_pageseg_mode 1' parameter tells tesseract to > include the hocr. (I produced one with the radical name of the output file > In all case it worked. > > I had an issue with Tesseract complaining for a file named > osd.traineddata. I copied the eng.tesseract onto this name and it was ok. > Sound like a config bug, I have no idea where it comes from. > > > > > Le vendredi 28 février 2014 15:53:07 UTC+1, Quan Nguyen a écrit : > >> I use: >> >> tesseract.exe imagefile outfile pdf >> >> On Friday, February 28, 2014 4:57:57 AM UTC-6, Bernard Polarski wrote: >>> >>> Indeed and I am currently exploring this. I did compile the 3.03 in >>> Cygwin ( had to remove this -std=c++11 flag of CXXFLAGS from configure and >>> configure.ac ). >>> I ended with a set of binaries in /usr/local/bin >>> >>> rwxr-xr-x 1 P0957 Domain Users 68047 Feb 26 12:46 convertfilestopdf.exe >>> -rwxr-xr-x 1 P0957 Domain Users 65424 Feb 26 12:46 convertfilestops.exe >>> -rwxr-xr-x 1 P0957 Domain Users 69965 Feb 26 12:46 convertformat.exe >>> -rwxr-xr-x 1 P0957 Domain Users 70510 Feb 26 12:46 >>> convertsegfilestopdf.exe >>> -rwxr-xr-x 1 P0957 Domain Users 66500 Feb 26 12:46 >>> convertsegfilestops.exe >>> -rwxr-xr-x 1 P0957 Domain Users 63798 Feb 26 12:46 converttopdf.exe >>> -rwxr-xr-x 1 P0957 Domain Users 65555 Feb 26 12:46 converttops.exe >>> -rwxr-xr-x 1 P0957 Domain Users 6585300 Feb 26 12:46 cyglept-4.dll >>> -rwxr-xr-x 1 P0957 Domain Users 76194 Feb 26 12:46 fileinfo.exe >>> -rwxr-xr-x 1 P0957 Domain Users 69640 Feb 26 12:46 printimage.exe >>> -rwxr-xr-x 1 P0957 Domain Users 73276 Feb 26 12:46 printsplitimage.exe >>> -rwxr-xr-x 1 P0957 Domain Users 63738 Feb 26 12:46 printtiff.exe >>> -rwxr-xr-x 1 P0957 Domain Users 69765 Feb 26 12:46 splitimage2pdf.exe >>> -rwxr-xr-x 1 P0957 Domain Users 3208652 Feb 27 10:00 tesseract.exe >>> -rwxr-xr-x 1 P0957 Domain Users 76794 Feb 26 12:46 xtractprotos.exe >>> I did not find any documentation yet on these. At last resort, I will >>> have to review the C code of each in order to figure out the usage and >>> descrepancies. >>> My first experiments with 'convertsegfilestopdf.exe' are not successfull >>> in integrating the hOcr into the PDF. >>> I did only succed to produce a standalone PDF. 'filefinfo' is definitely >>> welcome. >>> >>> Le vendredi 28 février 2014 01:15:41 UTC+1, Quan Nguyen a écrit : >>> >>>> Beginning 3.03, Tesseract includes support for searchable PDF output. >>>> >>>> On Thursday, February 27, 2014 8:17:15 AM UTC-6, Bernard Polarski >>>> wrote: >>>>> >>>>> I cannot find the binaries for hocr2pdf from exact-image for windows >>>>> (even for cygwin). >>>>> There are quite a few python scritps but I could not put anyone of >>>>> them successfully at work. >>>>> Always missing a library and many of them include parts of exact-image. >>>>> >>>>> When it comes to hocr2pdf.net, there is no binary either. it seems to >>>>> be only a library. >>>>> >>>>> Anyone know a tool, still available to transform the hocr output from >>>>> tesseract into a pdf ? >>>>> >>>> -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

