Yesterday my partner asked me how to OCR PDF documents herself. I knew I needed to find a GUI for Tesseract, but found the wiki not particularly clear.
So I rewrote the "GUI" section to be tabular, making it clear what platforms each tool runs on, and the license. I also removed some of the implementation details (e.g. Java, C++) from the descriptions, as I think for most people the page would be more useful without them cluttering it up. See attached patch. I also think it might be a good idea to separate the "end-user" programs, language bindings, and training helper programs into their own pages. "AddOns" isn't an obvious place for somebody to go who hears "tesseract is good at OCR" and wants to find a GUI based on it. Maybe split into pages called "GraphicalInterfaces", "TrainingTools" and "LanguageBindings". Then we have the "Other" programs to put somewhere, but something like "OtherProgramsUsingTesseract" (eugh, camel case) could work fine. Agreed? Nick P.S. I'm having trouble creating a Google account. Once I manage it I'll put the patch in the issue tracker too. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
Index: AddOns.wiki =================================================================== --- AddOns.wiki (revision 737) +++ AddOns.wiki (working copy) @@ -7,19 +7,20 @@ == GUI == - * [http://vietocr.sourceforge.net/ VietOCR]: A Java/.NET GUI frontend for Tesseract OCR engine. Supports optical character recognition for Vietnamese and other languages supported by Tesseract. _Requirements:_ Java or .NET - * [http://symmetrica.net/cuneiform-linux/yagf-en.html YAGF]: graphical front-end for cuneiform and tesseract. _Requirements:_ QT4 , aspell for spellchecking, GhostScript for PDF processing - * [http://sourceforge.net/projects/gimagereader/ gImageReader]: A graphical GTK frontend to tesseract-ocr. _Requirements:_ python, PyGtk - * [http://live.gnome.org/OCRFeeder OCRFeeder]: OCRFeeder is a document layout analysis and optical character recognition system. _Requirements:_ linux, Python, pyGTK, Ghostscript, Unpaper - * [http://www.paperfile.net/ FreeOCR]: Free OCR is a document scanning software including the Windows compiled Tesseract free ocr engine. It is very simple to use and supports opening multi-page tiff documents, Adobe PDF and fax documents as well as most image types including compressed Tiff's. It can scan using Twain and WIA scanning drivers. _Requirements:_ Windows, .NET - * [http://solutions.weblite.ca/pdfocrx/ PDF OCR X]: PDF OCR is a simple drag-and-drop utility for Mac OS X and Windows, that converts your PDFs and images into text documents or searchable PDF files. _Requirements:_ Mac OS X 10.5/Windows with Java 1.6 or higher - * [http://code.google.com/p/lector/ Lector]: A graphical ocr solution for GNU/Linux based on Python, Qt4 and tessaract OCR. _Requirements:_ Python, Qt4 - * [https://github.com/zdenop/tesseract-ocr-qt4gui Tesseract-OCR QT4 gui]: Tesseract-OCR QT4 gui is a simple GUI for tesseract _Requirements:_ Qt4 - * [http://code.google.com/p/lime-ocr/ Lime OCR]: A simple, free OCR software for Windows using tesseract-ocr engine. _Requirements:_ Windows, ImageMagick - * [http://code.google.com/p/ocrivist/ Ocrivist]: Ocrivist is a utility which makes it possible to scan and OCR books and other printed documents to PDF or Djvu format. Ocrivist is intended for use on Linux and uses the Leptonica and Tesseract libraries. _Requirements:_ Linux, Pascal - * [http://tesseract-gui.sourceforge.net Tesseract-GUI]: Tessract-GUI is not a front-end for tesseract-ocr. It is just a graphical way to use it with simple image manipulation thru ImageMagick. _Requirements:_ python, pyGTK - * [http://code.google.com/p/qtesseract/ QTesseract]: QT GUI for the Tesseract OCR. _Requirements:_ QT4, c++ - * [http://djvu.life.coocan.jp/TessOCR/doc/tessOCR_eng.html TessOCR(KISI)]: a free OCR tool. _Requirements:_ Mac OS X, java +|| *Name* || *Linux* || *Mac* || *Windows* || *License* || *Description* || +|| [http://vietocr.sourceforge.net/ VietOCR] || X || X || X || Apache 2.0 || A GUI frontend for Tesseract OCR engine. Supports optical character recognition for Vietnamese and other languages supported by Tesseract || +|| [http://symmetrica.net/cuneiform-linux/yagf-en.html YAGF] || X || || || GPL v3 || A graphical front-end for cuneiform and tesseract || +|| [http://sourceforge.net/projects/gimagereader/ gImageReader] || X || || X || GPL v3 || A graphical GTK frontend to tesseract-ocr || +|| [http://live.gnome.org/OCRFeeder OCRFeeder] || X || || || GPL v3 || OCRFeeder is a document layout analysis and optical character recognition system || +|| [http://www.paperfile.net/ FreeOCR] || || || X || Apache 2.0 || Free OCR is a document scanning software including the Windows compiled Tesseract free ocr engine. It is very simple to use and supports opening multi-page tiff documents, Adobe PDF and fax documents as well as most image types including compressed Tiff's. It can scan using Twain and WIA scanning drivers || +|| [http://solutions.weblite.ca/pdfocrx/ PDF OCR X] || || X || X || Proprietary || PDF OCR is a simple drag-and-drop utility for Mac OS X and Windows, that converts your PDFs and images into text documents or searchable PDF files || +|| [http://code.google.com/p/lector/ Lector] || X || || X || GPL v2 || A graphical ocr solution for GNU/Linux based on Python, Qt4 and tessaract OCR || +|| [https://github.com/zdenop/tesseract-ocr-qt4gui Tesseract-OCR QT4 gui] || X || || || Apache 2.0 || Tesseract-OCR QT4 gui is a simple GUI for tesseract || +|| [http://code.google.com/p/lime-ocr/ Lime OCR] || || || X || GPL v3 || A simple, free OCR software for Windows using tesseract-ocr engine || +|| [http://code.google.com/p/ocrivist/ Ocrivist] || X || || || GPL v3 || Ocrivist is a utility which makes it possible to scan and OCR books and other printed documents to PDF or Djvu format || +|| [http://tesseract-gui.sourceforge.net Tesseract-GUI] || X || || || GPL v2 || Tessract-GUI is not a front-end for tesseract-ocr, it is just a graphical way to use it with simple image manipulation thru ImageMagick || +|| [http://code.google.com/p/qtesseract/ QTesseract] || X || || || LGPL v3 || QT GUI for the Tesseract OCR || +|| [http://djvu.life.coocan.jp/TessOCR/doc/tessOCR_eng.html TessOCR(KISI)] || || X || || Apache 2.0 || A free OCR tool || == Online OCR services == @@ -134,4 +135,4 @@ * [http://rosarior.github.com/mayan/ Mayan EDMS]: Document management system with tesseract as it's base * [http://git.lrde.epita.fr/?p=olena.git;a=summary Olena]: a generic and efficient image processing platform (tesseract is used in its part called [http://git.lrde.epita.fr/?p=olena.git;a=tree scribo]) * [http://jwilk.net/software/ocrodjvu ocrodjvu] is a wrapper for OCR systems, that allows you to perform OCR on DjVu files - \ No newline at end of file +

