Hi Dan, Thanks for the quick response. Yes, I thoroughly enjoyed implementing the auto orientation. And I've created a separate binary for this specific step in the chain, however it seems as if image region support is intended (at least at some stage it was) to be supported by ocropus. Perhaps not at the moment. For the time being I'll look at just creating a separate binary, but +1 for integrating Leptonica, or at least the auto-orientation and image segmentation. Love to hear how others have solved this problem.
Because I'm using images from cameras, id really like to have FST language model support and thus am only using tesseract for generating draft ground truth. And on that note, I've been unable to understand if I can call tesseract from ocropus to generate ground truth drafts. I'd find that a useful feature, and at the moment just plan on doing it in a python script. Perhaps after the StandardPreProcessing, an option could be to generate ground truth from tesseract. Probably not an original idea, but it would be neat. Cheers, Nathan On 23 July 2011 13:51, Dan Bloomberg <[email protected]> wrote: > Nathan, > I can't directly help you, but I support leptonica and you can get and > install it from leptonica.org or code.google.com/p/leptonica. > Why the standard install of ocropus no longer includes leptonica > continues to baffle me, given that ocropus claims its intent is to > aggregate the best document imaging and analysis software. > This is further baffling because tesseract, which is the best open > source OCR package, uses leptonica. > Depending on what you want to do, it is likely that you can > implement your application directly with tesseract and leptonica. > When you download leptonica, look at the programs in the prog > directory. There are about 200 of them. Many are regression tests, > and quite a few show you how to solve typical document-related > tasks. There are several ways to identify and remove image > regions, for example. If you can't find anything sufficiently close, let me > know > and I'll give you some suggestions. > -- Dan > On Fri, Jul 22, 2011 at 8:35 PM, Nathan K <[email protected]> wrote: >> >> Is anyone currently using TextImageSegByLeptonica or any >> ITextImageClassification interfaces? I've been unable to get any >> working using a recent snapshot of trunk. First I tried the standard >> install as documented on the install transcript page. Which yeilded >> something along the lines of 'component doesn't exist', which I >> assumed was because I the standard install doesn't include Leptonica >> anymore. Thus I tried the the following, and ended up with the >> subsequent error. Greatly appreciate any pointers. >> >> Also, I noticed that the method 'textImageProbabilities' returns an >> 'intarray', this can't be saved using 'write_png', what is the best >> way to visulise the results? And in addition, remove image regions? Is >> there a sister method? More generally, I've got a binary image region >> on my documents which I'm trying to remove. I assume this is the >> correct approach. Cheers Nathan >> >> Command: >> scons lept=1 >> >> Produces: >> >> g++ -o ocr-layout/ocr-layout-rast.os -c >> -DDATADIR='"/usr/local/share/ocropus"' >> -DDEFAULT_DATA_DIR='"/usr/local/share/ocropus/models"' >> -DDEFAULT_EXT_DIR='"/usr/local/share/ocropus/extensions"' -g -fPIC -O2 >> -Wall -Wno-sign-compare -Wno-write-strings -Wno-unknown-pragmas >> -D__warn_unused_result__=__far__ -D_BACKWARD_BACKWARD_WARNING_H=1 >> -fopenmp -fPIC -DHAVE_LEPTONICA -DHAVE_SQLITE3 -Iocr-lineseg >> -Iocr-commands -Iocr-leptonica -Iocr-binarize -Iocr-pfst -Iocr-utils >> -Iocr-line -Iocr-layout -Iocr-voronoi -I/usr/local/include >> -I/usr/local/include -I/usr/include/leptonica >> -I/usr/local/include/leptonica -I/usr/local/include/leptonica >> ocr-layout/ocr-layout-rast.cc >> In file included from ocr-layout/ocr-layout-internal.h:60, >> from ocr-layout/ocr-layout-rast.cc:28: >> ocr-layout/ocr-text-image-seg.h:77: error: no unique final overrider >> for 'virtual const char* iulib::IComponent::interface()' in >> 'ocropus::RemoveImageRegions' >> scons: *** [ocr-layout/ocr-layout-rast.os] Error 1 >> scons: building terminated because of errors. >> >> >> -- >> Email: nathank [at] noshly.com (professional) >> Email: its [at] madteckhead.com (personal) >> Website: http://www.madteckhead.com >> >> -------------------------------------------- >> Q: Why is this email five sentences or less? >> A: http://five.sentenc.es >> >> This email (including any attachments) is confidential and may be >> privileged. If you have received it in error, please notify the sender >> by return email and delete this message from your system. Any >> unauthorised use or dissemination of this message in whole or in part >> is strictly prohibited. Please note that emails are susceptible to >> change and we will not be liable for the improper or incomplete >> transmission of the information contained in this communication nor >> for any delay in its receipt or damage to your system. We do not >> guarantee that the integrity of this communication has been maintained >> nor that this communication is free of viruses, interceptions or >> interference. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "ocropus" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/ocropus?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "ocropus" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/ocropus?hl=en. > -- Email: nathank [at] noshly.com (professional) Email: its [at] madteckhead.com (personal) Website: http://www.madteckhead.com -------------------------------------------- Q: Why is this email five sentences or less? A: http://five.sentenc.es This email (including any attachments) is confidential and may be privileged. If you have received it in error, please notify the sender by return email and delete this message from your system. Any unauthorised use or dissemination of this message in whole or in part is strictly prohibited. Please note that emails are susceptible to change and we will not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. We do not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference. -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
