On 9/27/10 3:40 PM, Jonathan Rochkind wrote: > Can you give me an example of a page where: > > http://www.archive.org/stream/[id] > > does NOT give you a book-reader? The only examples I've been able to find > are where requesting that simply redirects you back to /details -- where, I > guess there is no "/stream" available. But sometimes there's a "/stream" > available, but no actual bookreader? If you could give me an example of > that, it would help me figure out the optimal "scraping" approach. If the item is a text item (I assume you're only using text items, so this should be true) and you get a normal response on /stream/[id] that currently means you're getting the BookReader. So for now you don't need to parse the HTML. I'm not aware of any anticipated changes to that behaviour. > I'd still really really rather use some kind of API than a scraping approach > like that. I mean, it's even an "api" if you said "request /stream/id -- if > you get a redirect, no search inside is available, if you don't, it is". > But it would be even better to get it in the same API response as other OL/IA > queries, so I didn't need to make another HTTP request just for this. But > making another HTTP request where I can just check the http status is a lot > better than having to sniff the page for including a specific js file, that > is both more expensive and seems awfully fragile. > Checking for the redirect as you describe should work. > I'd suggest again that you might want to consider making discoverability of > this kind of thing by third party apps a priority -- I think exposing this > kind of thing in third party apps like mine can really increase exposure and > traffic to your materials. Duly noted. We're in the process of getting the full-text search to actually work on openlibrary.org and inside the BookReader. Good to have some feedback now on integration points for 3rd-parties.
- mang > ________________________________________ > From: [email protected] [[email protected]] On Behalf Of > Michael Ang [[email protected]] > Sent: Monday, September 27, 2010 6:10 PM > To: [email protected] > Subject: Re: [ol-tech] discovering and linking to search inside functions for > hosted text > > There are two email lists you might be interested in relating > specifically to the BookReader: > > Announcements, including new releases: > http://mail.archive.org/cgi-bin/mailman/listinfo/bookreader-announce > > General development: > http://mail.archive.org/cgi-bin/mailman/listinfo/bookreader-devel > > On 9/27/10 3:08 PM, Michael Ang wrote: >> On 9/27/10 1:44 PM, Jonathan Rochkind wrote: >>> I think I asked this question like two years ago, and the answer was >>> "No, not yet, but we'd like that." So I'm pinging again. >>> >>> Some Internet Archive/OL full text exists in a 'page turner' interface >>> that also has 'search inside' functionality. For instance: >>> http://www.archive.org/stream/thesetwain00bennrich#page/n5/mode/2up >>> >>> Using IA/OL APIs, I am already identifying internet archive ID's of >>> interest, like say "thesetwain00bennrich". Using that identifier, is >>> there any way using IA/OL APIs for me to: >>> >>> 1) Discover if a book is available in that page-turner format (not >>> everything is). >> Unfortunately the logic to determine if a book can be displayed is a >> little complicated and we don't have a proper API that exposes the result. >> >> In the meantime this is a little cheesy but you could fetch >> http://www.archive.org/stream/{itemid} and look for the string >> "BookReader.js" in the returned HTML. That should indicate that the >> BookReader is being served. >> >> That should work for all the books which we've scanned. For user >> uploaded text items it's a little more complicated since there is >> usually an additional 'sub-prefix' that is also required. Right now >> there isn't a great way to find out the sub-prefix... we make that >> determination by looking at the item files.xml for the files that the >> BookReader needs (sorry). >> >> 2) Deep link into search results for a particular query in a particular >>> book. >> This already works by appending "search/{terms}" after the # in the >> BookReader URL. >> >> e.g. >> http://www.archive.org/stream/nimrodofseaorame00davirich#page/18/mode/2up/search/albatross >> >> This is documented here: >> http://openlibrary.org/dev/docs/bookurls#searching >> >> We're working on using an improved full-text search engine instead of >> the current rudimentary search. This should only give better results >> and shouldn't affect the deep-linked search URLs! >> >> - mang >>> If #1 can be taken care of, but #2 can't be because of limitations in >>> the javascript reader, then I might try to find time to submit a patch >>> to the javascript reader to make that possible, although I'm not sure >>> when I'd find the time to do so. >>> >>> Jonathan >>> _______________________________________________ >>> Ol-tech mailing list >>> [email protected] >>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech >>> To unsubscribe from this mailing list, send email to >>> [email protected] >> _______________________________________________ >> Ol-tech mailing list >> [email protected] >> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech >> To unsubscribe from this mailing list, send email to >> [email protected] > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
