Hi,
Is there a way to output the HOCR tesseract generates into a good HTML5
page complete with the text's positioning and font style ?
Or best to just read the bbox coordinates as is and output to an HTML5 ?
div class='ocr_carea' id='block_2_8' title=bbox 1165 1335 1644 1358
p
which version of ubuntu on which tesseract installed. also indicate version
of tesseract-ocr - since I want to install on ubuntu 15.04.
On Sun, Aug 30, 2015 at 10:20 AM, fsbo.cons...@gmail.com wrote:
are there different types of installations of which I have chosen the
wrong one? The
Tesseract:
tesseract-ocr: Installed: 3.03.02-3
Ubuntu:
Ubuntu 14.04.3 LTS
Also, just to make sure I'm not missing something, is there a distinction
between tesseract-ocr and tesseract?
On Sunday, August 30, 2015 at 1:59:00 AM UTC-7, sriranga(82yrsold) wrote:
which version of ubuntu on
Hello everyone,
I have a digital copy of a book I own that was delivered to me in what
might be the most inconvenient of formats - one PDF per page, with all
non-image data on the page - text included - converted to vector shapes.
While I can re-combine the pages together, add bookmarks/page
The links you gave me are great. I created the tiff/box pair on a mac as
follows:
raining/text2image --text=yor.training_text
--outputbase=yor.VerdanaMedium.exp0 --font='Verdana Medium'
--fonts_dir=/Library/Fonts
Then I ran training as follows:
tesseract yor.VerdanaMedium.exp0.tif
5 matches
Mail list logo