[tesseract-ocr] Re: Use Tesseract as a library at Windows with Qt

2018-01-03 Thread John Grossman

Hey there.  I wanted to do pretty much the same thing as you and after an 
hour or so of messing around I managed to get something which looks like it 
will work for me, so I thought I would post and let you know what I ended 
up doing (since there do not seem to be any responses yet).

FWIW - I was building for x86, not x64.  I started with the standard 
build-from-git instructions, but made sure to check out the 3.05 branch 
before building (I just moved my local master branch to the 3.05 branch)

   1. git clone ...
   2. git reset --hard origin/3.05
   3. cppan
   4. mkdir build && cd build
   5. cmake ..

Once this completed, I had a build directory with a tesseract.sln file in 
it as well as a libtesseract.vcxproj file.  Loading the solution file in 
VS2017 allowed me to build both release and debug versions of libtesseract 
without any issues.  In the end, however, instead of just directly linking 
in the library to my own project, I ended up just adding the existing 
libtesseract.vcxproj file to my own project's solution.

After that, all I needed to do was...

   1. Add a reference from my project to the libtesseract project in my 
   solution.
   2. Add the include paths to C/C++ -> General -> Additional Include 
   Directories.  I needed to add "api", "ccmain", "ccstruct", and "ccutil" 
   (all in the directory you checked-out tesseract into).
   3. My project is a DLL and (for some reason) had Linker -> Link Library 
   Dependencies set to "yes".  I needed to turn this off to avoid a situation 
   where two different copies of MSVCRT were getting linked (you probably will 
   not have to do this).

Be sure to change all of the versions (release and debug) of your project's 
config when adding the include dirs.  Now I just need to make sure that 
everything is actually working, but at least I am building and linking 
without problems now :D

hope this helps.

-john

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d5e34338-254e-4d6b-83d4-b8c4c57e8f97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: How to use tesseract4.0 to only recognize the digits??

2018-01-03 Thread Thomas Menguy
Hi Shree, 

Tried your Data for digits ... really works well!
Need to do a training set with number and signs for example ... could you 
point me on how you've done your own training data (sorry fairly new to 
Tesseract, never trained it before)

Thanks for your help!
BR

On Tuesday, October 3, 2017 at 6:39:30 PM UTC+2, shree wrote:
>
> You can try the plus-minus type of training if you just want a digits type 
> of traineddata.
>
> Your training_text can contain numbers in the format you need and you can 
> train with a font matching your images.
>
> For proof of concept you can try my experimental version at 
>
>
> https://github.com/Shreeshrii/tessdata4alpha/blob/master/fast/digits.traineddata
>
> On Friday, September 29, 2017 at 12:32:41 PM UTC+5:30, John Miller wrote:
>>
>> Today,I found that the problem had been  posted on 
>> https://github.com/tesseract-ocr/tesseract/issues/751
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5f98dc8f-55e9-46dc-84b2-4ee1c7adc868%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: how to use PDF as Input

2018-01-03 Thread Quan Nguyen
Tesseract engine cannot read PDF. You'll have to convert them to suitable 
images (TIFF or PNG) first. There are many tools for that: ImageMagick, 
GhostScript, PDFBox, etc.

On Wednesday, January 3, 2018 at 12:05:12 PM UTC-6, Subhanshu Gupta wrote:
>
> Dear All,
>
> I am new to Tesseract OCR and need to implement it to Read PDF Forms but I 
> am not able to find any good documentation for which method to use to read 
> PDF as well as for Character Segmentation.
> If any of you have any doc/manual relating on which method is used where 
> it will be really very helpful.
>
> Thanks. :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f157b744-f25c-459f-ae5e-ebf429ae3ff3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] how to use PDF as Input

2018-01-03 Thread Subhanshu Gupta
Dear All,

I am new to Tesseract OCR and need to implement it to Read PDF Forms but I 
am not able to find any good documentation for which method to use to read 
PDF as well as for Character Segmentation.
If any of you have any doc/manual relating on which method is used where it 
will be really very helpful.

Thanks. :)

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dcbcf996-a06e-4d7f-a514-8e5b027b67b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.