Let me say that when it comes to technical matters I am an ignoramus. So when I asked earlier about collaborating with Bookshare in getting Bookshare volunteers to proofread Open Library books and I was asked how that was done, with a web form or something like Word and was asked about word coordinates I did not exactly understand what was being asked about. I said that Bookshare volunteers download a scanned book and proofread it and then upload it with corrections and the corrected copy goes into the collection. I said that Bookshare volunteers commonly use Word to do this. Actually, Bookshare volunteers use whatever word processing software they happen to have. They are not supplied with any. However, Bookshare has its own set of tools that do conversions and that make the book a Daisy book before it is ready for download. I do not know what kind of tools these are, but on rereading the quoted material in this email reply I noted something that I had not noticed before. That was the comment about retaining word coordinates so that it would be possible to search inside the book. As a matter of fact, Bookshare's search engine does search inside the books. When you do a search of the collection both titles and text from inside the books themselves are returned in the results. I do not understand what word coordinates are nor much about the other technical aspects of search engines or other matters that make these books available in Daisy format, but if the Bookshare search engine searches inside the text of the books then perhaps Bookshare books are compatible with Open Library books after all. My suggestion was that perhaps scanned Open Library books could be supplied to Bookshare to be proofread by Bookshare volunteers and then could be added to the Bookshare collection and a copy could then be returned to Open Library to be added to the Open Library protected Daisy collection. That way both Bookshare and Open Library would benefit. Does the information that Bookshare books can be searched inside the text make that sound a bit more feasible? Questions about word coordinates are not something I would be able to answer though.
On 12/30/2011 1:12 AM, Janusz S. Bień wrote: > On Thu, 29 Dec 2011 Edward Betts<[email protected]> wrote: > >> We don't currently have a system for recording the quality of the OCR or >> correcting mistakes. >> >> As you point out the OCR doesn't properly handle blackletter type. > There is a solution to it, but it is expensive: > > http://www.frakturschrift.com/ > >> A system for correcting OCR is often requested, conceptually it is quite >> simple. > But not in practise... > >> Just a web page that shows the page image and a way to edit the >> text. We keen to maintain page coordinate information for each word so >> that we can highlight words in the book reader and search inside. This >> makes the problem more difficult. >> >> We would like to build a correction system, but we don't have the resources. > Building such a system seems to be a goal of several projects, but I > haven't found yet anything satisfactory for my purposes. The IMPACT > project developed a system that looks nice but again it is probably to > be quite expensive: > > http://www.digitisation.eu/index.php?id=109 > > Best regards > > Janusz > _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
