Zdenko:
I have following use case for tesseract C ++ 4.1 APi
I would like to read multi-page non-searchable pdf file as an input parameter
in PIX or PIXA, as output I would like to create searchable pdf file
my question to you
which tesseract C ++ Api Function I can call,
to read the multipage non-searchable pdf file in PIX or PIXA,
Do you have a little C ++ example about this topic
I mean, exactly like the command line: tesseract test.pdf output pdf
(test.pdf is multipage pdf file as input parameter)
Am Freitag, 25. Oktober 2019 16:35:14 UTC+2 schrieb Ivica Anic:
>
> Hi,
> I am testing the Tesseract C++ API (4.1 Version).
> Here is my code:
>
>
> char *datapath = "C:\\Temp\\tessdata-master";
> string language_ = "deu";
> string inputFile_ = "./input.png";
> tesseract::TessBaseAPI *api100 = new tesseract::TessBaseAPI();
> if (api100->Init(datapath, "deu", tesseract::OEM_LSTM_ONLY)) {
> fprintf(stderr, "Could not initialize tesseract.\n");
> exit(1);
> }
>
>
> api100->SetVariable("tessedit_create_pdf", "T");
> //png File is input file
> PIX *sourceImg100 = pixRead(inputImage.c_str());
>
> api100->SetImage(sourceImg100);
>
>
> api100->Recognize(0);
>
> api100->SetPageSegMode(tesseract::PSM_AUTO_ONLY);
> api100->SetInputName(inputImage.c_str());
> tesseract::TessResultRenderer *renderer100 = new
> tesseract::TessPDFRenderer("output_base", api100->GetDatapath(),false);
>
> renderer100->BeginDocument("test");
> renderer100->AddImage(api100);
> api100->ProcessPage(sourceImg100, 0, inputImage.c_str(), NULL, 5000,
> renderer100);
> renderer100->EndDocument();
> api100->End();
> pixDestroy(&sourceImg100);
>
> how can I get a searchable PDF file output and save it on my
> computer ?
> I mean, exactly like the command line : tesseract test.tif output
> pdf
>
> Zdenko:
> by my test one output pdf File is created,but pdf file is not
> readable
> if I try to open pdf File it is comming Error XREF-Data in pdf-file
> are missing
>
>
>
>
> Thanks a lot
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/f9fbb2d9-7224-4925-bad2-fa267f6cb96e%40googlegroups.com.