GitHub user epugh opened a pull request:
https://github.com/apache/tika/pull/133
add hOCR output format to TesseractParser TIKA-2093
Small change to Tesseract OCR code to add the hOCR outputType. In the
future we can add `pdf` and `tsv` as output types as well.
First
GitHub user epugh opened a pull request:
https://github.com/apache/tika/pull/136
TIKA-2106. Need to lowercase the output file to match the format passed to
tesseâ¦
â¦ract cmd line.
You can merge this pull request into a Git repository by running:
$ git pull https