You can start with reading docs and then searching issue tracker and forum for "table".
Zdenko ut 7. 4. 2020 o 7:38 amrapalli karan <amrapallika...@gmail.com> napĂsal(a): > I have this .pdf file which I am able to read only partially. I am using R > language to fetch the data from the pdf file which is uploaded in the form > of an image. > > The expected output is: > > CONTINUOUS CAST COPPER WIRE ROD 11 MM 44*1*567*CATHODE FULL **434122* > CONTINUOUS CAST COPPER WIRE ROD NS 439678 > CONTINUOUS CAST COPPER WIRE ROD 16 MM 443056...etc > > The actual output which I am getting: > > CONTINUOUS CAST COPPER WIRE ROD 11 MM 44567 > CONTINUOUS CAST COPPER WIRE ROD NS 439678 > CONTINUOUS CAST COPPER WIRE ROD 16 MM 443056...etc. > > The highlighted part of the text is missing when I am extracting the data. A > part of the code that I am using in R is : > > pdf_convert(event_url, > pages = 1, > dpi = 850, > filenames = "page1.png")# what does the data look like > text <- ocr("page1.png") > cat(text) > > What changes should I make that would help me fetch the complete data? Thanks > in advance > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/bd4e9b31-6264-4ba7-81ec-b7960b626a5e%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/bd4e9b31-6264-4ba7-81ec-b7960b626a5e%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wEUV3Am4V%3DYes%3DYqvPDy6qRNpqfXq5b%3DsSW2fUbVn22A%40mail.gmail.com.