[tesseract-ocr] Re: pdf -> searchable PDF

2017-01-15 Thread wikinaut
Andreas,

we track your issue now as new issue 
https://github.com/tesseract-ocr/tesseract/issues/660 . Please don't miss 
to follow the discussion there.

It looks, as if the main developers are really interested in finding and 
implementing a solution (in which I am also very interested in.)

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/11144ee7-f049-47f9-a286-a5f329136a63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: pdf -> searchable PDF

2017-01-13 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/issues/83 and other
PDF related issues in GitHub repo with similar discussion.

- excuse the brevity, sent from mobile

On 13-Jan-2017 10:15 PM, "James R Barlow"  wrote:

> Tesseract cannot rasterize PDFs. It is fairly straightforward to write a
> PDF like does, but very complex to rasterize one.
>
> Programs like OCRmyPDF (which I develop) use Ghostscript, Tesseract and
> other tools to handle PDF to searchable PDF conversion.
>
>
> On Tuesday, January 10, 2017 at 9:34:57 PM UTC-8, Andreas Steibl wrote:
>>
>> Hello
>>
>> I have a pdf (scanned) and now i make a searchable pdf from this
>> First i generate a black/white multipage tif, and with tesseract i can
>> make a searchable pdf.
>>
>> But is it somehow possible to integrate the original pdf images?
>> because the generated tif has not the same quality like the original
>> (maybe the scaned image is in color)
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/2dccb3d2-f45e-4f47-9d04-302814d7f4ce%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXrQVdZOoHAChVXDMQ1%2BDjDYV5zgRE6hWnAmq%2B-fSU4DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: pdf -> searchable PDF

2017-01-13 Thread James R Barlow
Tesseract cannot rasterize PDFs. It is fairly straightforward to write a 
PDF like does, but very complex to rasterize one.

Programs like OCRmyPDF (which I develop) use Ghostscript, Tesseract and 
other tools to handle PDF to searchable PDF conversion.


On Tuesday, January 10, 2017 at 9:34:57 PM UTC-8, Andreas Steibl wrote:
>
> Hello
>
> I have a pdf (scanned) and now i make a searchable pdf from this
> First i generate a black/white multipage tif, and with tesseract i can 
> make a searchable pdf.
>
> But is it somehow possible to integrate the original pdf images?
> because the generated tif has not the same quality like the original 
> (maybe the scaned image is in color)
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2dccb3d2-f45e-4f47-9d04-302814d7f4ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.