10qu too! On Mon, Jun 13, 2011 at 10:00 PM, <[email protected]> wrote:
> Send Ol-discuss mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Ol-discuss digest..." > > > Today's Topics: > > 1. Re: ol.org book reader (Lars Aronsson) > 2. Re: ol.org book reader (Michael Ang) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 13 Jun 2011 13:10:03 +0200 > From: Lars Aronsson <[email protected]> > Subject: Re: [ol-discuss] ol.org book reader > To: Open Library -- general discussion <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 06/12/2011 04:14 PM, Karen Coyle wrote: > > This doesn't answer your exact question, but the full text of the > > digitized books is crawled. You can see this by doing a Google search > > like: > > > > LOUISIANA SCOTT SHUMAN site:archive.org > > > > That's a very artificial search, but it gives you the idea. This isn't > > related to the book reader but to the stored full text on the Internet > > Archive. > > Exactly, that's my point: it "isn't related to the book reader", > but I think it should be. It gives hits in ..._djvu.txt, but Google > doesn't lead me to the right page. > > > > -- > Lars Aronsson ([email protected]) > Project Runeberg - free Nordic literature - http://runeberg.org/ > > > > > ------------------------------ > > Message: 2 > Date: Mon, 13 Jun 2011 11:34:15 -0700 > From: Michael Ang <[email protected]> > Subject: Re: [ol-discuss] ol.org book reader > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > We had the idea to have the OCR text on separate URLs by page (or > similar) to improve search accessibility a few years ago and we may yet > get there. We're working on having the OCR text available for reading > and correction (may not immediately be integrated with the BookReader). > > For the BookReader I might go with the new #! url fragments that are > designed to allow web apps to dynamically update the url while still > being accessible to search engines. > http://code.google.com/web/ajaxcrawling/docs/specification.html > > - mang > > On 6/11/11 7:49 PM, Lars Aronsson wrote: > > Reading my own question again, I understand I didn't phrase it > > very well: > >> Can this be combined with making the text searchable > >> by web search engines, like plain web pages? > > Here's what I envision, and my question is if you have > > any plans going in this direction: > > > > In the bookreader, one should not only be able to zoom > > in and out or to activate the sound playback, but also to > > view the OCR text and proofread the OCR text (like a > > wiki page). To a search engine spider, only the view text > > option should be available, and the buttons for previous > > and next page should be plain links, so the text of each > > page gets indexed under the right page URL. > > > > The way I would want the bookreader to appear to a > > search spider is the way my existing website looks, > > this example being the first page of Hamlet, in the > > Swedish translation of 1861, > > http://runeberg.org/hagberg/a/0183.html > > Here is the scanned book page, and you can scroll > > down to the OCR text below. > > > > If you google the role names "Voltimand, Cornelius, > > Rosenkranz, Gyldenstern", you will see that it > > is indexed by Google at this very URL. (English and > > German editions spell the names a little different.) > > > > I'd like to use the bookreader with its soft scrolling > > and book page flipping for humans, but I don't > > want to give up the direct per page indexing by > > Google and other search engines. So, can the > > two be combined? Did anybody try this? > > > > > > > > ------------------------------ > > _______________________________________________ > Ol-discuss mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss > To unsubscribe from this mailing list, send email to > [email protected] > > End of Ol-discuss Digest, Vol 47, Issue 5 > ***************************************** >
_______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
