Can you give me an example of a page where: http://www.archive.org/stream/[id]
does NOT give you a book-reader? The only examples I've been able to find are where requesting that simply redirects you back to /details -- where, I guess there is no "/stream" available. But sometimes there's a "/stream" available, but no actual bookreader? If you could give me an example of that, it would help me figure out the optimal "scraping" approach. I'd still really really rather use some kind of API than a scraping approach like that. I mean, it's even an "api" if you said "request /stream/id -- if you get a redirect, no search inside is available, if you don't, it is". But it would be even better to get it in the same API response as other OL/IA queries, so I didn't need to make another HTTP request just for this. But making another HTTP request where I can just check the http status is a lot better than having to sniff the page for including a specific js file, that is both more expensive and seems awfully fragile. I'd suggest again that you might want to consider making discoverability of this kind of thing by third party apps a priority -- I think exposing this kind of thing in third party apps like mine can really increase exposure and traffic to your materials. ________________________________________ From: [email protected] [[email protected]] On Behalf Of Michael Ang [[email protected]] Sent: Monday, September 27, 2010 6:10 PM To: [email protected] Subject: Re: [ol-tech] discovering and linking to search inside functions for hosted text There are two email lists you might be interested in relating specifically to the BookReader: Announcements, including new releases: http://mail.archive.org/cgi-bin/mailman/listinfo/bookreader-announce General development: http://mail.archive.org/cgi-bin/mailman/listinfo/bookreader-devel On 9/27/10 3:08 PM, Michael Ang wrote: > On 9/27/10 1:44 PM, Jonathan Rochkind wrote: >> I think I asked this question like two years ago, and the answer was >> "No, not yet, but we'd like that." So I'm pinging again. >> >> Some Internet Archive/OL full text exists in a 'page turner' interface >> that also has 'search inside' functionality. For instance: >> http://www.archive.org/stream/thesetwain00bennrich#page/n5/mode/2up >> >> Using IA/OL APIs, I am already identifying internet archive ID's of >> interest, like say "thesetwain00bennrich". Using that identifier, is >> there any way using IA/OL APIs for me to: >> >> 1) Discover if a book is available in that page-turner format (not >> everything is). > Unfortunately the logic to determine if a book can be displayed is a > little complicated and we don't have a proper API that exposes the result. > > In the meantime this is a little cheesy but you could fetch > http://www.archive.org/stream/{itemid} and look for the string > "BookReader.js" in the returned HTML. That should indicate that the > BookReader is being served. > > That should work for all the books which we've scanned. For user > uploaded text items it's a little more complicated since there is > usually an additional 'sub-prefix' that is also required. Right now > there isn't a great way to find out the sub-prefix... we make that > determination by looking at the item files.xml for the files that the > BookReader needs (sorry). > > 2) Deep link into search results for a particular query in a particular >> book. > This already works by appending "search/{terms}" after the # in the > BookReader URL. > > e.g. > http://www.archive.org/stream/nimrodofseaorame00davirich#page/18/mode/2up/search/albatross > > This is documented here: > http://openlibrary.org/dev/docs/bookurls#searching > > We're working on using an improved full-text search engine instead of > the current rudimentary search. This should only give better results > and shouldn't affect the deep-linked search URLs! > > - mang >> If #1 can be taken care of, but #2 can't be because of limitations in >> the javascript reader, then I might try to find time to submit a patch >> to the javascript reader to make that possible, although I'm not sure >> when I'd find the time to do so. >> >> Jonathan >> _______________________________________________ >> Ol-tech mailing list >> [email protected] >> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech >> To unsubscribe from this mailing list, send email to >> [email protected] > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected] _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
