Hello Tom, I'm sorry, I forgot to put the verion in my post! I'm using the version 3.0.2.0 This is the latest version that we have a wrapper for .Net
Em quinta-feira, 10 de março de 2016 13:03:17 UTC, Edson Luis Moretti escreveu: > > Hello everyone, > > I'm using Tesseract in VB.Net with > hOcr2Pdf.NET <https://hocrtopdf.codeplex.com/> > to write an underlay text with OCR Data and mount a searchable pdf. > > Tesseract is recognizing the text well, My problem is that the underlay > text is in the wrong position as you can see in the image attached. > > Anyone already had that problem? > > I'm passing the HTML generated by the sub Tesseract.GetHOCRText to the > hDocument of HOcr2Pdf.Net but seems like the positions and sizes are wrong. > > My code to create the pdf > With tesseract.Process(currentPageImage) > OCRParser.ParseHOCR(hdoc, .GetHOCRText(0, True), True) > pdfCreator.AddPage(hdoc.Pages(hdoc.Pages.Count - 1), > currentPageImage) > hdoc.Pages.RemoveAt(hdoc.Pages.Count - 1) > > > .Dispose() > End With > pdfCreator.SaveAndClose() > this OCRParser class is the same class Parser of hOcr2Pdf.Net but that > class is in a private namespace and I can't access. > I did this because to add a new HTML page to hDocument you need to pass a > path of a HTML file and I don't want to save the tesseract output just to > pass as an argument. > Doing this way I changed the Parser class to get the HTML object from text > and not from a file, now I can pass the HTML text instead of a path of a > HTML file. > > Can my problem be something related with tesseract training? is it > recognizing the wrong font size or something like that? > > I'm using the Default english trained data, If I made my own trained data > with my samples should the Underlay text be created in the right > size/position? > > Many thanks! > Edson Luis Moretti. > > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4cbdd041-eb1f-4ac1-94d0-ea155d476f35%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

