[tesseract-ocr] Re: text close to lines

2017-07-10 Thread Quan Nguyen
You can call Lept4J's *LeptUtils* 

.*removeLines* 

(*Pix* 

 pixs).

http://tess4j.sourceforge.net/docs/index.html


On Monday, July 10, 2017 at 3:24:12 AM UTC-5, GuillaumeQ wrote:

> I have in a document some text written in a table. the lines of the table 
> are pretty close to the text. when i doOCR, i dont get the text between the 
> lines. is there any way to improve this performance and read some text 
> close to lines? the image is attached
>
> my code:
>
> def ocrToStream(){
> def imageFile = new File("path\\to.PNG")
>  ITesseract instance = new Tesseract1() // JNA Direct Mapping
> instance.setDatapath("") // replace  with 
> path to parent directory of tessdata
> instance.setLanguage("fra")
>
> try {
> def result = instance.doOCR(imageFile)
> System.out.println(result)
> } catch (TesseractException e) {
> System.err.println(e.getMessage())
> } catch (IOException e) {
> System.err.println(e.getMessage())
> }
> }
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d040f7a8-9cd6-4830-b29c-7175e3be58e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: text close to lines

2017-07-10 Thread THintz
Charles Weld's Tesseract .Net implements Leptonica's RemoveLines for 
grayscale in Pix.cs.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a557c26c-f2f6-4ba2-bc8e-c5ab7df20627%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Not getting any result with Tesseract-ocr v3.05.01 on Windows

2017-07-10 Thread Ruben Gaspar
Just to add that the pdf's I am trying to process are mostly handwritten 
documents. Is it possible for tesseract-ocr to deal with handwritten docs?

Thank you,
Ruben

On Friday, 7 July 2017 20:44:11 UTC+2, Ruben Gaspar wrote:
>
> Hello, 
>
> While trying to use Tesseract v3.05.01 on my documents I get:
>
> "c:\Program Files (x86)\Tesseract-OCR\tesseract.exe" G:\\n_12_ocr11.png 
> n_12_ocr11 -l eng
> Tesseract Open Source OCR Engine v3.05.01 with Leptonica
> Error in pixCreateNoInit: pix_malloc fail for data
> Error in pixCreateTemplateNoInit: pixd not made
> Error in pixCreateTemplate: pixd not made
> Error in pixCopy: pixd not made
> Error in pixGetDepth: pix not defined
> Error in pixGetWpl: pix not defined
> Error in pixGetYRes: pix not defined
> Error in pixClone: pixs not defined
> Please call SetImage before attempting recognition.Error during processing.
>
>
> Is this normal? Are my documents not feasible to be treated by Tesseract ? 
> I am running the software on a Windows 2012R2 server.
>
> THank you,
> Ruben
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4c3484ca-ecce-49eb-bebf-c48d8b8cc0ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] New Fonts Training

2017-07-10 Thread joshyjosh . martinez
Hi everyone,

All the tutorials I have been through just mention "creating a new font 
file and then training Tesseract" so how do I create font files for 
training?

For example, if I have a couple of price labels that I wanna OCR their 
characters and numbers, how do I put together a font file for them, in 
order to train Tesseract?

Thanks in advance,


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/507f87b7-d9ee-4206-a5e5-bd01a8dd3862%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.