Re: [tesseract-ocr] The text is not recognized from png

Zdenko Podobny Tue, 07 Apr 2020 03:58:27 -0700

You can start with reading docs and then searching issue tracker and forum
for "table".


Zdenko


ut 7. 4. 2020 o 7:38 amrapalli karan <amrapallika...@gmail.com> napísal(a):

> I have this .pdf file which I am able to read only partially. I am using R
> language to fetch the data from the pdf file which is uploaded in the form
> of an image.
>
> The expected output is:
>
> CONTINUOUS CAST COPPER WIRE ROD 11 MM 44*1*567*CATHODE FULL **434122*
> CONTINUOUS CAST COPPER WIRE ROD NS 439678
> CONTINUOUS CAST COPPER WIRE ROD 16 MM 443056...etc
>
> The actual output which I am getting:
>
> CONTINUOUS CAST COPPER WIRE ROD 11 MM 44567
> CONTINUOUS CAST COPPER WIRE ROD NS 439678
> CONTINUOUS CAST COPPER WIRE ROD 16 MM 443056...etc.
>
> The highlighted part of the text is missing when I am extracting the data. A 
> part of the code that I am using in R is :
>
> pdf_convert(event_url,
>             pages = 1,
>             dpi = 850,
>             filenames = "page1.png")# what does the data look like
> text <- ocr("page1.png")
> cat(text)
>
> What changes should I make that would help me fetch the complete data? Thanks 
> in advance
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/bd4e9b31-6264-4ba7-81ec-b7960b626a5e%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/bd4e9b31-6264-4ba7-81ec-b7960b626a5e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wEUV3Am4V%3DYes%3DYqvPDy6qRNpqfXq5b%3DsSW2fUbVn22A%40mail.gmail.com.

Re: [tesseract-ocr] The text is not recognized from png

Reply via email to