Re: [CODE4LIB] opening a pdf file
Eric, Downloaded, but the default pdf viewer (QuickOffice on my android phone) just shows me pages with a big red X. I downloaded the Adobe Reader app and now i can see the content, but it does not look anything like the image I see in my pc's browser. Garbled / washed out, though each page is different. Chad Chad Nelson Web Services Programmer University Library Georgia State University e: cnelso...@gsu.edu t: 404 413 2771 My Calendar From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Eric Lease Morgan [emor...@nd.edu] Sent: Monday, October 03, 2011 9:58 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] opening a pdf file Are any of you able to open the following URL with an Android-based tablet device: http://dh.crc.nd.edu/sandbox/cyl/corpus/canarybird00schm.pdf I have harvested about 60 PDF documents from the Internet Archive, and I created a rudimentary tablet-based interface to the collection here: http://dh.crc.nd.edu/sandbox/cyl/catalog/ Using my desktop machine I am able to download and view the PDF documents found in the catalog, but I am unable to view them on my iPad nor my iPhone. G… I've tried many different PDF viewers on my iPad and iPhone, all to no avail. I'm wondering whether or not this is a iOS thing or if there is something wrong with my PDF files. Please tell me what your experience is. -- Eric Morgan University of Notre Dame
Re: [CODE4LIB] opening a pdf file
On Mon, Oct 3, 2011 at 2:58 PM, Eric Lease Morgan emor...@nd.edu wrote: Are any of you able to open the following URL with an Android-based tablet device: http://dh.crc.nd.edu/sandbox/cyl/corpus/canarybird00schm.pdf It is educational to look at memory use in the pc when that pdf is loaded. Evince here is using 600meg do you have space for such objects on these little toys try something like diva so you dont suck the resources dry on the client an experiment here http://www.collection.archivist.info/diva/systrondonner1626.html http://www.collection.archivist.info/diva/lucastp1.html Dave Caroline
Re: [CODE4LIB] opening a pdf file [diva]
On Oct 3, 2011, at 10:26 AM, Dave Caroline wrote: It is educational to look at memory use in the pc when that pdf is loaded. Evince here is using 600meg do you have space for such objects on these little toys try something like diva so you dont suck the resources dry on the client Please tell me (us) more about diva. I am not familiar with it. --Eric Morgan
Re: [CODE4LIB] opening a pdf file
Eric, I have to answer mostly no to your question. I can open it with the Adobe reader on my Android tablet, but the pages are too blurred to read. It's a little (don’t take this literally) like an interlaced image when only the first half of the lines are displayed. I've never run into this before. When I open it with Documents to Go, each page is blank with a red X across it. I've run into occasional pages like this with large graphical PDF files (most often with one that's 77 Mb), but after waiting maybe 5 seconds the pages render fine. With your file, the pages are all like that and waiting doesn't help. Whatever the problem, it's not likely to be the size of the file. Mike -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Monday, October 03, 2011 9:58 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] opening a pdf file Are any of you able to open the following URL with an Android-based tablet device: http://dh.crc.nd.edu/sandbox/cyl/corpus/canarybird00schm.pdf I have harvested about 60 PDF documents from the Internet Archive, and I created a rudimentary tablet-based interface to the collection here: http://dh.crc.nd.edu/sandbox/cyl/catalog/ Using my desktop machine I am able to download and view the PDF documents found in the catalog, but I am unable to view them on my iPad nor my iPhone. G… I've tried many different PDF viewers on my iPad and iPhone, all to no avail. I'm wondering whether or not this is a iOS thing or if there is something wrong with my PDF files. Please tell me what your experience is. -- Eric Morgan University of Notre Dame
Re: [CODE4LIB] opening a pdf file
Most people seem to have mixed results when trying to open the PDF files on their tablet-based (Android and iOS) devices. Bummer! These PDF files were harvested from the Internet Archive. They seem to be viewable just fine for desktop machines, but not tablets. The number of files I have in the collection is finite and small -- less than 100. Does anybody have any suggestions on how I could convert or modify the files to make them more useful? I would be willing to manually do the modifications if necessary. -- Eric Morgan University of Notre Dame
Re: [CODE4LIB] opening a pdf file [diva]
On Oct 3, 2011, at 11:12 AM, Dave Caroline wrote: Diva was announced here of 6th of June https://listserv.nd.edu/cgi-bin/wa?A2=ind1106L=CODE4LIBT=0F=S=P=27064 The clever part is you only send the visible part at the scale they are viewing so little excess bandwidth. For online document view it takes some beating and is not too hard to set up My demo is running on an adsl line from home, probably a worst case speed demo. http://ddmal.music.mcgill.ca/diva http://ddmal.music.mcgill.ca/diva/demo Very interesting, and thank you for bringing it to my attention. It seems it relies on a technology that reads and chunks up image files. Alas, I have PDFs. Moreover, I really want people to be able to print the entire documents. I suppose I could convert my PDF files into images and go that route. Hmm… -- Eric Morgan
Re: [CODE4LIB] opening a pdf file
I imagine the problems have to do with the multi-layered nature of the Internet Archive scans; even on my PC, the PDF doesn't render well until it is fully downloaded -- while it is in the process of loading, I see a blurry mess. Perhaps the tablet devices are doing the same thing -- attempting to display content prior to the full load of the file but not entirely succeeding. It may be possible to flatten the layers of the PDFs in some way, but I'm not sure exactly how. Additionally, I notice that there are different versions of the PDF here: http://www.archive.org/details/canarybird00schm (one labeled PDF, another B/W PDF) Does one version work better on tablets than the other? - Demian From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan [emor...@nd.edu] Sent: Monday, October 03, 2011 11:26 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] opening a pdf file Most people seem to have mixed results when trying to open the PDF files on their tablet-based (Android and iOS) devices. Bummer! These PDF files were harvested from the Internet Archive. They seem to be viewable just fine for desktop machines, but not tablets. The number of files I have in the collection is finite and small -- less than 100. Does anybody have any suggestions on how I could convert or modify the files to make them more useful? I would be willing to manually do the modifications if necessary. -- Eric Morgan University of Notre Dame
Re: [CODE4LIB] opening a pdf file
On Oct 3, 2011, at 11:32 AM, Demian Katz wrote: Additionally, I notice that there are different versions of the PDF here: http://www.archive.org/details/canarybird00schm (one labeled PDF, another B/W PDF) Does one version work better on tablets than the other? At first glance, the black white versions do not render on my iPad either. Hurumph! -- Eric Getting Desperate Morgan
Re: [CODE4LIB] opening a pdf file [diva]
On 2011-10-03, at 11:29 AM, Eric Lease Morgan wrote: Very interesting, and thank you for bringing it to my attention. It seems it relies on a technology that reads and chunks up image files. Alas, I have PDFs. Moreover, I really want people to be able to print the entire documents. I suppose I could convert my PDF files into images and go that route. Hmm… I'm one of the developers of Diva. I noticed that you've been getting your files from the Internet Archive. They also have the full high-quality JPEG and JPEG2000 images available. http://ia600209.us.archive.org/6/items/acourseofreligio00gerauoft/ You could use those for Diva instead of the already-compressed PDF. Printing could still be handled by downloading the PDF, but if you just want to be able to view it online then I'd be happy to help you get Diva set up. Note that we also have an article in the latest C4L journal describing how it works:http://journal.code4lib.org/articles/5418 Cheers! -Andrew -- Eric Morgan
Re: [CODE4LIB] opening a pdf file [diva]
So this is awesome, does it in fact work with PDF's or not, and if not does anyone have any similar tools recommended for pdfs ap On 10/3/11 11:12 AM, Dave Caroline dave.thearchiv...@gmail.com wrote: Diva was announced here of 6th of June https://listserv.nd.edu/cgi-bin/wa?A2=ind1106L=CODE4LIBT=0F=S=P=27064 The clever part is you only send the visible part at the scale they are viewing so little excess bandwidth. For online document view it takes some beating and is not too hard to set up My demo is running on an adsl line from home, probably a worst case speed demo. Site is http://ddmal.music.mcgill.ca/diva real demo http://ddmal.music.mcgill.ca/diva/demo Dave Caroline On Mon, Oct 3, 2011 at 3:36 PM, Eric Lease Morgan emor...@nd.edu wrote: On Oct 3, 2011, at 10:26 AM, Dave Caroline wrote: It is educational to look at memory use in the pc when that pdf is loaded. Evince here is using 600meg do you have space for such objects on these little toys try something like diva so you dont suck the resources dry on the client Please tell me (us) more about diva. I am not familiar with it. --Eric Morgan
Re: [CODE4LIB] screen scraping
On Oct 3, 2011 9:19 AM, Ed Summers e...@pobox.com wrote: On Sun, Oct 2, 2011 at 10:32 PM, Ken Irwin kir...@wittenberg.edu wrote: 1. respect robots.txt Disclaimer: I am not a lawyer. Remember that robots.txt applies only to recursive web crawlers, and not to screen-scraping per se. In cases where it does apply, it has limited legal effect, but ignoring it is not cricket. Important considerations are: is access to the site governed by a license that prohibits the activity; is the content being scraped subject to copyright, and if so, is the screen scraping covered by one of the exceptions to exclusive rights of the copyright holder; is the screen-scraping activity disruptive and damaging to the site being used (trespass to chattels, etc.)? A bit of reflection on the Golden Rule probably is probably more important than pondering the legality of what you are doing. Ed invoking philosophy? With citation? (wikipedia still counts) :-p The usual objection to the golden rule apply here- just because one has no objection to having a screen scraper used on your own site doesn't automatically imply that others might not wish to have their sites scraped. Simon
Re: [CODE4LIB] screen scraping
On Sun, Oct 2, 2011 at 9:35 PM, Reese, Terry terry.re...@oregonstate.edu wrote: In Canada, the BC Supreme Court ruled that screen scrapping real estate listings from one site and using them on another indeed infringed on copyright. Not sure if this would cover your use -- but if you are coming from Canada, it might be something to consider. Decision URL: http://www.canlii.org/en/bc/bcsc/doc/2011/2011bcsc1196/2011bcsc1196.html If you read the decision, it looks as though the content found to be infringing was the property's description and photograph, which are creative works. Indexing factual data about a property *only* (asking price, address, square footage, etc) may have been on stronger legal footing. Regards, -Nate
Re: [CODE4LIB] opening a pdf file [diva]
On Mon, Oct 3, 2011 at 11:57 AM, Andrew Hankinson andrew.hankin...@gmail.com wrote: I'm one of the developers of Diva. I noticed that you've been getting your files from the Internet Archive. They also have the full high-quality JPEG and JPEG2000 images available. http://ia600209.us.archive.org/6/items/acourseofreligio00gerauoft/ You could use those for Diva instead of the already-compressed PDF. While I agree that Diva offers a really good on-screen reading experience (probably the best I've used so far), Archive.org itself offers a good one too. So, for the first book in Eric's list, http://www.archive.org/details/acourseofreligio00gerauoft the on-screen reader is at http://www.archive.org/stream/acourseofreligio00gerauoft I tried it out in my 3 year old, 2nd generation iPod Touch over the flakey campus WiFi and found that it displayed quite nicely. You have paging controls, but can also use touch gestures to scroll and pinch the page larger. Like Diva, it uses lazy loading techniques, so you don't have to wait until the whole document is available to start reading. Tom
[CODE4LIB] OpenLibrary book covers privacy concerns
Does anyone know if OpenLibrary tracks specific user information on the book covers they provide? My concern is privacy of the patrons, not stat use? Thanks. ~Erin
Re: [CODE4LIB] OpenLibrary book covers privacy concerns
On 4 Oct 2011 07:27, Erin Germ erinlovestec...@gmail.com wrote: Does anyone know if OpenLibrary tracks specific user information on the book covers they provide? My concern is privacy of the patrons, not stat use? Why not ask them? I've found them nothing but exceedingly helpful Chris Thanks. ~Erin
Re: [CODE4LIB] ny times best seller api
Some of OCLC's APIs do support JSONP or CORS: for example QuestionPoint API, the xIdentifier and MapFAST services. However, other services do not provide this support. This is because for these services we need to carefully ensure that the application making the request is actually owned by the institution/user to which the key has been issued. If we don't do this then there are several consequences 1. We aren't able to clearly distinguish who is using a given service. This fundamentally makes it more difficult for us to keep statistics and evaluate the value of our services to the OCLC membership. 2. Users/Institutions not eligible to use particular services are able to access those services. 3. Other users/institutions may be able to see data which is private and specific to a particular institution We know that developers want to use our APIs in Javascripts. As a result we're working really hard on potential solutions that would allow us to both provide this type of access and ensure that the application making the request of the API is coming from an appropriate authorized institution/user. If anyone has further questions, they are welcome to email me directly. Karen Karen Coombs Product Manager OCLC Developer Network coom...@oclc.org On Mon, Oct 3, 2011 at 8:21 AM, Ed Summers e...@pobox.com wrote: On Wed, Sep 28, 2011 at 5:36 PM, Godmar Back god...@gmail.com wrote: Closer to the code4lib community: OCLC and Serials Solutions don't support JSONP in their webservices, either, even though doing so would allow cool services and would likely not affect their business models adversely in a significant way, IMO. We should keep lobbying them to remove these restrictions, as I've been doing for a while. I agree. I'm not sure how pervasive it is at OCLC, but their MapFast Service supports Cross Origin Resource Sharing (CORS) [1,2], which means that JSONP isn't needed for modern browsers. Basically it's just adding the following header to the JSON response: Access-Control-Allow-Origin: * Something to think about when creating a web service for others, at any rate. [1] http://en.wikipedia.org/wiki/Cross-Origin_Resource_Sharing [2] http://inkdroid.org/journal/2011/02/09/oclcs-mapfast-and-cors/
Re: [CODE4LIB] opening a pdf file [diva]
It doesn't work with PDFs, since it needs to create a tiled TIFF image for each page. I don't know of anything similar for PDFs, since they're not really designed to render a portion of the document without downloading the entire thing. You can convert PDF pages to images, though... :) -Andrew On 2011-10-03, at 12:09 PM, Parker, Anson (adp6j) wrote: So this is awesome, does it in fact work with PDF's or not, and if not does anyone have any similar tools recommended for pdfs ap On 10/3/11 11:12 AM, Dave Caroline dave.thearchiv...@gmail.com wrote: Diva was announced here of 6th of June https://listserv.nd.edu/cgi-bin/wa?A2=ind1106L=CODE4LIBT=0F=S=P=27064 The clever part is you only send the visible part at the scale they are viewing so little excess bandwidth. For online document view it takes some beating and is not too hard to set up My demo is running on an adsl line from home, probably a worst case speed demo. Site is http://ddmal.music.mcgill.ca/diva real demo http://ddmal.music.mcgill.ca/diva/demo Dave Caroline On Mon, Oct 3, 2011 at 3:36 PM, Eric Lease Morgan emor...@nd.edu wrote: On Oct 3, 2011, at 10:26 AM, Dave Caroline wrote: It is educational to look at memory use in the pc when that pdf is loaded. Evince here is using 600meg do you have space for such objects on these little toys try something like diva so you dont suck the resources dry on the client Please tell me (us) more about diva. I am not familiar with it. --Eric Morgan
Re: [CODE4LIB] opening a pdf file
The problem is PDF and the viewers. some/most expand ALL the compressed images and create thumbs from the images before they start display, This uses huge amounts of memory, a technology fail, they just dont fit certain work. If you are lucky the viewer keeps it compressed so it fits in memory and only uncompresses a single image that is being displayed, this stops niceties like thumbs unless a background thread goes through the images and caches the thumbs (slow), this is where even a tiled tiff viewer could be a lot better. Someone needs to do a tiled compressed format along with a viewer and banish PDF to the other side of the moon. Dave Caroline
Re: [CODE4LIB] opening a pdf file [diva]
-Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Andrew Hankinson I don't know of anything similar for PDFs, since they're not really designed to render a portion of the document without downloading the entire thing. The linearized form of PDF is designed to do just that: permit a PDF viewer to display the first page while still downloading the rest of the document. The pdfopt utility, which I believe installs with ghostscript, will do this from the command line. Alternatively you can save a PDF with fast web view enabled in Acrobat Professional. There are surely other ways to create a linearized PDF, but those are the methods I've used. -AB -- Head, Digital Research Library University Library System University of Pittsburgh 7500 Thomas Blvd., Room 306 Pittsburgh, PA 15260 Phone: 412.244.7526 Fax: 412.244.7537
Re: [CODE4LIB] screen scraping
Another reason to check with the webmaster, all legalities aside, is that their top ten list might actually be being built on an RSS feed, but for whatever reason they don't offer it directly as a feed (or they do, but it wasn't obvious to you where that feed was to be found). They might prefer you grab the feed rather than scrape the screen. I don't actually have any feed-based pages on our site that aren't also available as feeds -- but some people might. Also, for usage statistics reasons, I'd rather have bots hitting the feeds instead of the pages. Genny Engel Sonoma County Library gen...@sonoma.lib.ca.us 707 545-0831 x581 www.sonomalibrary.org -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nate Hill Sent: Sunday, October 02, 2011 7:23 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] screen scraping A question: what are the 'rules' around screen scraping? If one site doesn't offer an RSS feed and you want to grab (for example) their weekly top ten list with a script and then redisplay it on another site, is that bad form? Or even illegal? Thanks- Nate -- Nate Hill nathanielh...@gmail.com http://www.natehill.net
Re: [CODE4LIB] opening a pdf file [diva]
Another idea, if you are looking for an app-based rather than web-based reader is VuDroid, which supports both PDF and DjVu formats. http://code.google.com/p/vudroid/ I suggest it, not because I use it but because, at least in the Open Library version of the book's record, http://openlibrary.org/books/OL7169556M/ DjVu is listed as a streaming format. If I had an Android, I would give it a try. For iOS, there's DjVu reader that seems pretty decent. http://xzonesoftware.com/products/xdjvu I may check it out later tonight. Tom
Re: [CODE4LIB] opening a pdf file [diva]
I just installed EBookDroid (AFAICT the latest and greatest version of VuDroid) from the Android Market and tried it on the color version of Eric's canary file. It immediately loads numbered blank pages, then starts rendering the current page, which takes about 20 seconds. Previously rendered pages disappear pretty quickly after scrolling off of them. I works pretty snappily on other files and looks like it could turn out to be my PDF reader of choice. Mike -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tom Keays Sent: Monday, October 03, 2011 3:33 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] opening a pdf file [diva] Another idea, if you are looking for an app-based rather than web-based reader is VuDroid, which supports both PDF and DjVu formats. http://code.google.com/p/vudroid/ I suggest it, not because I use it but because, at least in the Open Library version of the book's record, http://openlibrary.org/books/OL7169556M/ DjVu is listed as a streaming format. If I had an Android, I would give it a try. For iOS, there's DjVu reader that seems pretty decent. http://xzonesoftware.com/products/xdjvu I may check it out later tonight. Tom