Hey All,

I posted this in General but it may have been the wrong place.

I would like to develop a Tesseract OCR plugin for plone to OCR
uploaded documents. I am somewhat new to Plone but I believe this
would be a pretty useful plugin. It would also allow full text
indexing of images (TIFF, PDF, etc) in Plone.

Does anyone want to help me in this?

The basic idea that I has was that a button would be added to
documents that the user would click to OCR a document. The user would
choose basic options (such as language) and the document would be sent
for Tesseract for OCRing. The resulting text file would then be placed
in the same folder as the document.

I thought that this could be done by creating a workflow that executes
a script to send the document to Tess and upload the output file, but
of course that can be changed if a better suggestion is made.

In future versions of the product I would like to add the ability to
have the OCR done on a another server in a que. The user would select
to OCR the document, the document would be placed in a que on the OCR
server and the results returned to plone after processing. I would
also like to add the ability to OCR PDF documents and return a
searchable PDF. There is a lot more that can be done with this.

Let me know if you're interested. I don't believe it is very
complicated but I cannot do it on my own.

Thanks 
-- 
View this message in context: 
http://www.nabble.com/Tesseract-OCR-plugin-for-Plone-tf4844339s20094.html#a13859762
Sent from the Product Developers mailing list archive at Nabble.com.


_______________________________________________
Product-Developers mailing list
[email protected]
http://lists.plone.org/mailman/listinfo/product-developers

Reply via email to