Re: [tesseract-ocr] Re: Generating a PDF with Tesseract C++API (4.1Version)

Zdenko Podobny Sat, 26 Oct 2019 04:11:29 -0700

Why do you think there is problem in tesseract?

output.pdf is open without problem in acrobat reader, chrome/chromium,
sumatrapdf.
output.pdf pass without error on
https://www.pdf-online.com/osa/validate.aspx,
https://www.datalogics.com/products/pdftools/pdf-checker/ and
https://www.pdfen.com/pdf-a-validator as pdf 1.5...


You should understand what you are doing. E.g. setting variable  tessedit_
create_pdf is useless.

Zdenko


so 26. 10. 2019 o 5:40 Ivica Anic <[email protected]> napísal(a):

> Zdenko
> when I try with your sample, I'm getting folowwing Error
> Das Dokument kann nicht geöffnet werden.
> Ein Fehler ist beim Öffnen des Dokuments aus der Datei aufgetreten:
> C:\Users\ocr\output.pdf.
> Error [PXCLib]: Required value not found.
>
> =====================================================
> when I add to your Sample two Lines and try
> api100->SetVariable("tessedit_create_pdf", "T");
> api100->SetPageSegMode(tesseract::PSM_AUTO_ONLY);
> I'm getting  Error by trying to open pdf output file
> Folgende Probleme wurden im Dokument gefunden:
> - Einer oder mehrere XREF-Datenströme wurden nicht gefunden (XREF-Data are
> missing)
>
> Am Freitag, 25. Oktober 2019 16:35:14 UTC+2 schrieb Ivica Anic:
>>
>>     Hi,
>>      I am testing the Tesseract C++ API (4.1 Version).
>>        Here is my code:
>>
>>
>>        char *datapath = "C:\\Temp\\tessdata-master";
>> string language_ = "deu";
>> string inputFile_ = "./input.png";
>> tesseract::TessBaseAPI *api100 = new tesseract::TessBaseAPI();
>> if (api100->Init(datapath, "deu", tesseract::OEM_LSTM_ONLY)) {
>> fprintf(stderr, "Could not initialize tesseract.\n");
>> exit(1);
>> }
>>
>>
>> api100->SetVariable("tessedit_create_pdf", "T");
>>       //png File is input file
>> PIX *sourceImg100 = pixRead(inputImage.c_str());
>>
>> api100->SetImage(sourceImg100);
>>
>>
>> api100->Recognize(0);
>>
>> api100->SetPageSegMode(tesseract::PSM_AUTO_ONLY);
>> api100->SetInputName(inputImage.c_str());
>> tesseract::TessResultRenderer *renderer100 = new
>> tesseract::TessPDFRenderer("output_base", api100->GetDatapath(),false);
>>
>> renderer100->BeginDocument("test");
>> renderer100->AddImage(api100);
>> api100->ProcessPage(sourceImg100, 0, inputImage.c_str(), NULL, 5000,
>> renderer100);
>> renderer100->EndDocument();
>> api100->End();
>> pixDestroy(&sourceImg100);
>>
>>         how can I get a searchable PDF file output and save it on my
>> computer ?
>>        I mean, exactly like the command line : tesseract test.tif output
>> pdf
>>
>>        Zdenko:
>>        by my test one output pdf File is created,but pdf file is not
>> readable
>>        if I try to open pdf File it is comming Error XREF-Data in
>> pdf-file are missing
>>
>>
>>
>>
>>       Thanks a lot
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/61d8e86b-7c23-488c-b441-d3f75e8924f1%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/61d8e86b-7c23-488c-b441-d3f75e8924f1%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yW5iqeRCng%3DDzQwrn1UFKPRxQt4OX0Hsr5noKwsUHfAQ%40mail.gmail.com.

Re: [tesseract-ocr] Re: Generating a PDF with Tesseract C++API (4.1Version)

Reply via email to