Just to clarify, I am working in a ECM solution We are using Pytesser do make OCR over large documents(50000+ words) and that is working very well!
So, we need in almost real time to give that results in to a serch page, more than 200 companies with many users will have access to that search page, and searches are made by person or companies names. Example: I want to know and count how many times my name appears inside that documents, and which documents is. document path and the whole data content are stored in 5 different PostGre dbs and web2py is the base of the front-end and now I am working in the search engine pages. I am testing Mincemeat and Disco, does anybody knows other ways? 2010/9/14 Bruno Rocha <[email protected]> > > I dont know if this was discussed here before, > BTW, I leave the Tip. > > Map Reduce on Python ( Single Module, less than 13kb) > > http://remembersaurus.com/mincemeatpy/ > > I am testing that, works very well on my search engine by now. > > Maybe, could be more documented or integrate within web2py contrib. > > Should be great to put it in web2py API Level > -- http://rochacbruno.com.br

