Remove Line Finding and Base Line Fitting from Tesseract

2013-02-03 Thread abdul salam
Hi All, I am trying to remove Line Finding, Baseline Fitting and Skew Detection modules from tesseract as my input image always has straight line words, So i would like to know is there a way to remove this modules by configration. Also if not by config can i remove this code by merealy code

Line Finding and Baseline Fitting Removal from Tesseract

2013-02-03 Thread abdul salam
Hi All, I am trying to remove Line Finding , Baseline Fitting and Skew Detection modules from the tesseract code as my input image will always have straight line words(de-skiewed). I would like to know is there any configration method to remove this modules. Also if not by configration is it

Re: Issues with shapeclustering

2013-02-03 Thread zdenko podobny
You don't need to edit it. Just run command as on wiki. If is faster than editing tr file... Zdenko On Sun, Feb 3, 2013 at 12:21 AM, Carlos Antunes cf.antu...@gmail.comwrote: Zdenko, Shall I edit it and remove it before going further? Thanks. On Saturday, February 2, 2013 1:53:33 PM

Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread Michael Lissner
I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version 1.69. I've installed these, and also installed libtiff4 using apt-get. When I try to process a document, I get: ↪ sudo tesseract united_states_v._ups_customhouse_brokerage_inc.tif

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread zdenko podobny
Can you send and example of you tif file? Zdenko On Sun, Feb 3, 2013 at 10:08 PM, Michael Lissner mliss...@michaeljaylissner.com wrote: I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version 1.69. I've installed these, and also installed libtiff4 using apt-get. When I try to

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread Mike Lissner
It's about 300MB, unfortunately, but I generate it programmatically using imagemagick in a way that's worked in the past, so I don't think the tiff file itself is the issue. If you're willing to download this monster, I'll post it to dropbox. I'd love the help, but I don't think it's the right

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread zdenko podobny
Are you able to generate just one page or small example? Or can you provide step how you create it (so I can create it)? Tiff could be tricky. E.g. libtiff-4 do not work for me... Zdenko On Sun, Feb 3, 2013 at 10:29 PM, Mike Lissner mliss...@michaeljaylissner.com wrote: It's about 300MB,

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread Mike Lissner
Sure, that's a good idea. Here's the original PDF: http://courtlistener.com/pdf/2008/05/28/united_states_v._ups_customhouse_brokerage_inc..pdf If you download that, then run: convert -depth 4 -density 300 united_states_v._ups_customhouse_brokerage_inc..pdf

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread zdenko podobny
BTW: spp means Samples-per-pixel[1]. Are you able to instruct imagick to use 1,3 or 4? And I found report on stackoverflow[2] - there mentioned that imagick use to set spp to 2, which should be invalid for tiff... [1] http://tpgit.github.com/Leptonica/tiffio_8c_source.html [2]

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread Mike Lissner
OK, we're getting somewhere! I figured out that the Ubuntu repo just doesn't work properly with tiffs, and recompiled and installed tesseract and leptonica. So now when I run tesseract -v, I get: ↪ tesseract -v tesseract 3.02.02 leptonica-1.69 libjpeg 8b : libpng 1.2.46 : libtiff 3.9.5 :

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread Mike Lissner
Looks like I'm all set. I had to remove -flatten from the command above, and all is working now. Thanks so much for the help. On Sun, Feb 3, 2013 at 2:18 PM, Mike Lissner mliss...@michaeljaylissner.com wrote: OK, we're getting somewhere! I figured out that the Ubuntu repo just doesn't

Adding new fronts to Tesseract

2013-02-03 Thread Remon Georgy
Hi there, I wish to add new font to tesseract, but also I don't won't lose already recognisable fonts in eng.traindata. My question is, what are the default fonts of tesseract? and should I re-train Tesseract on those fonts besides the new font? Thanks -- -- You received this message because

Re: Success story using tesseract

2013-02-03 Thread Michael Young
I recently did a personal project with Tesseract (Equation OCR) and the finals results turned out pretty well: http://ayoungprogrammer.blogspot.ca/ On Friday, 1 February 2013 11:34:04 UTC-5, Jakub Jaroš wrote: Hello, in our project, we would like to decide about using Tesseract for it or

Best practices on dictionaries for English training

2013-02-03 Thread Carlos Antunes
Hello all, I was able to train some new fonts thanks to the help I've got here. The Wiki is somewhat vague when it comes to dictionaries. On the Wiki there are few dictionaries mentioned as well as the concern with the licenses. Looking at both aspell and ispell there are different list of

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread TP
On Sun, Feb 3, 2013 at 1:08 PM, Michael Lissner mliss...@michaeljaylissner.com wrote: I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version 1.69. I've installed these, and also installed libtiff4 using apt-get. libtiff4 is also known as bigtiff. [1] lists important backward