Hey All, I posted this in General but it may have been the wrong place.
I would like to develop a Tesseract OCR plugin for plone to OCR uploaded documents. I am somewhat new to Plone but I believe this would be a pretty useful plugin. It would also allow full text indexing of images (TIFF, PDF, etc) in Plone. Does anyone want to help me in this? The basic idea that I has was that a button would be added to documents that the user would click to OCR a document. The user would choose basic options (such as language) and the document would be sent for Tesseract for OCRing. The resulting text file would then be placed in the same folder as the document. I thought that this could be done by creating a workflow that executes a script to send the document to Tess and upload the output file, but of course that can be changed if a better suggestion is made. In future versions of the product I would like to add the ability to have the OCR done on a another server in a que. The user would select to OCR the document, the document would be placed in a que on the OCR server and the results returned to plone after processing. I would also like to add the ability to OCR PDF documents and return a searchable PDF. There is a lot more that can be done with this. Let me know if you're interested. I don't believe it is very complicated but I cannot do it on my own. Thanks -- View this message in context: http://www.nabble.com/Tesseract-OCR-plugin-for-Plone-tf4844339s20094.html#a13859762 Sent from the Product Developers mailing list archive at Nabble.com. _______________________________________________ Product-Developers mailing list [email protected] http://lists.plone.org/mailman/listinfo/product-developers
