Can you give me an example of a page where:

http://www.archive.org/stream/[id]

does NOT give you a book-reader?  The only examples I've been able to find are 
where requesting that simply redirects you back to /details -- where, I guess 
there is no "/stream" available.  But sometimes there's a "/stream" available, 
but no actual bookreader?  If you could give me an example of that, it would 
help me figure out the optimal "scraping" approach. 

I'd still really really rather use some kind of API than a scraping approach 
like that.  I mean, it's even an "api" if you said "request /stream/id -- if 
you get a redirect, no search inside is available, if you don't, it is".   But 
it would be even better to get it in the same API response as other OL/IA 
queries, so I didn't need to make another HTTP request just for this.  But 
making another HTTP request where I can just check the http status is a lot 
better than having to sniff the page for including a specific js file, that is 
both more expensive and seems awfully fragile. 

I'd suggest again that you might want to consider making discoverability of 
this kind of thing by third party apps a priority -- I think exposing this kind 
of thing in third party apps like mine can really increase exposure and traffic 
to your materials. 
________________________________________
From: [email protected] [[email protected]] On Behalf Of 
Michael Ang [[email protected]]
Sent: Monday, September 27, 2010 6:10 PM
To: [email protected]
Subject: Re: [ol-tech] discovering and linking to search inside functions for   
hosted text

  There are two email lists you might be interested in relating
specifically to the BookReader:

Announcements, including new releases:
http://mail.archive.org/cgi-bin/mailman/listinfo/bookreader-announce

General development:
http://mail.archive.org/cgi-bin/mailman/listinfo/bookreader-devel

On 9/27/10 3:08 PM, Michael Ang wrote:
>    On 9/27/10 1:44 PM, Jonathan Rochkind wrote:
>> I think I asked this question like two years ago, and the answer was
>> "No, not yet, but we'd like that."  So I'm pinging again.
>>
>> Some Internet Archive/OL full text exists in a 'page turner' interface
>> that also has 'search inside' functionality. For instance:
>> http://www.archive.org/stream/thesetwain00bennrich#page/n5/mode/2up
>>
>> Using IA/OL APIs, I am already identifying internet archive ID's of
>> interest, like say "thesetwain00bennrich".  Using that identifier, is
>> there any way using IA/OL APIs for me to:
>>
>> 1) Discover if a book is available in that page-turner format (not
>> everything is).
> Unfortunately the logic to determine if a book can be displayed is a
> little complicated and we don't have a proper API that exposes the result.
>
> In the meantime this is a little cheesy but you could fetch
> http://www.archive.org/stream/{itemid} and look for the string
> "BookReader.js" in the returned HTML.  That should indicate that the
> BookReader is being served.
>
> That should work for all the books which we've scanned.  For user
> uploaded text items it's a little more complicated since there is
> usually an additional 'sub-prefix' that is also required.  Right now
> there isn't a great way to find out the sub-prefix... we make that
> determination by looking at the item files.xml for the files that the
> BookReader needs (sorry).
>
> 2) Deep link into search results for a particular query in a particular
>> book.
> This already works by appending "search/{terms}" after the # in the
> BookReader URL.
>
> e.g.
> http://www.archive.org/stream/nimrodofseaorame00davirich#page/18/mode/2up/search/albatross
>
> This is documented here:
> http://openlibrary.org/dev/docs/bookurls#searching
>
> We're working on using an improved full-text search engine instead of
> the current rudimentary search.  This should only give better results
> and shouldn't affect the deep-linked search URLs!
>
>     - mang
>> If #1 can be taken care of, but #2 can't be because of limitations in
>> the javascript reader, then I might try to find time to submit a patch
>> to the javascript reader to make that possible, although I'm not sure
>> when I'd find the time to do so.
>>
>> Jonathan
>> _______________________________________________
>> Ol-tech mailing list
>> [email protected]
>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>> To unsubscribe from this mailing list, send email to 
>> [email protected]
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to 
> [email protected]

_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to