Re: [CODE4LIB] opening a pdf file

2011-10-03 Thread Chad Benjamin Nelson
Eric, 

Downloaded, but the default pdf viewer  (QuickOffice on my android phone) just 
shows me pages with a big red X.

I downloaded the Adobe Reader app and now i can see the content, but it does 
not look anything like the image I see in my pc's browser. Garbled / washed 
out, though each page is different.

Chad
Chad Nelson
Web Services Programmer
University Library
Georgia State University

e: cnelso...@gsu.edu
t: 404 413 2771
My Calendar


From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Eric Lease 
Morgan [emor...@nd.edu]
Sent: Monday, October 03, 2011 9:58 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] opening a pdf file

Are any of you able to open the following URL with an Android-based tablet 
device:

 http://dh.crc.nd.edu/sandbox/cyl/corpus/canarybird00schm.pdf

I have harvested about 60 PDF documents from the Internet Archive, and I 
created a rudimentary tablet-based interface to the collection here:

 http://dh.crc.nd.edu/sandbox/cyl/catalog/

Using my desktop machine I am able to download and view the PDF documents found 
in the catalog, but I am unable to view them on my iPad nor my iPhone. G… 
I've tried many different PDF viewers on my iPad and iPhone, all to no avail.

I'm wondering whether or not this is a iOS thing or if there is something wrong 
with my PDF files.

Please tell me what your experience is.

--
Eric Morgan
University of Notre Dame


Re: [CODE4LIB] opening a pdf file

2011-10-03 Thread Dave Caroline
On Mon, Oct 3, 2011 at 2:58 PM, Eric Lease Morgan emor...@nd.edu wrote:
 Are any of you able to open the following URL with an Android-based tablet 
 device:

  http://dh.crc.nd.edu/sandbox/cyl/corpus/canarybird00schm.pdf

It is educational to look at memory use in the pc when that pdf is loaded.
Evince here is using 600meg do you have space for such objects on
these little toys

try something like diva so you dont suck the resources dry on the client

an experiment here
http://www.collection.archivist.info/diva/systrondonner1626.html
http://www.collection.archivist.info/diva/lucastp1.html

Dave Caroline


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Eric Lease Morgan
On Oct 3, 2011, at 10:26 AM, Dave Caroline wrote:

 It is educational to look at memory use in the pc when that pdf is loaded.
 Evince here is using 600meg do you have space for such objects on
 these little toys
 
 try something like diva so you dont suck the resources dry on the client

Please tell me (us) more about diva. I am not familiar with it.  --Eric Morgan


Re: [CODE4LIB] opening a pdf file

2011-10-03 Thread KREYCHE, MICHAEL
Eric,

I have to answer mostly no to your question. 

I can open it with the Adobe reader on my Android tablet, but the pages are too 
blurred to read. It's a little (don’t take this literally) like an interlaced 
image when only the first half of the lines are displayed. I've never run into 
this before. 

When I open it with Documents to Go, each page is blank with a red X across it. 
I've run into occasional pages like this with large graphical PDF files (most 
often with one that's 77 Mb), but after waiting maybe 5 seconds the pages 
render fine. With your file, the pages are all like that and waiting doesn't 
help.

Whatever the problem, it's not likely to be the size of the file.

Mike
 
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric 
Lease Morgan
Sent: Monday, October 03, 2011 9:58 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] opening a pdf file

Are any of you able to open the following URL with an Android-based tablet 
device:

 http://dh.crc.nd.edu/sandbox/cyl/corpus/canarybird00schm.pdf

I have harvested about 60 PDF documents from the Internet Archive, and I 
created a rudimentary tablet-based interface to the collection here:

 http://dh.crc.nd.edu/sandbox/cyl/catalog/

Using my desktop machine I am able to download and view the PDF documents found 
in the catalog, but I am unable to view them on my iPad nor my iPhone. G… 
I've tried many different PDF viewers on my iPad and iPhone, all to no avail.

I'm wondering whether or not this is a iOS thing or if there is something wrong 
with my PDF files.

Please tell me what your experience is.

-- 
Eric Morgan
University of Notre Dame


Re: [CODE4LIB] opening a pdf file

2011-10-03 Thread Eric Lease Morgan
Most people seem to have mixed results when trying to open the PDF files on 
their tablet-based (Android and iOS) devices. Bummer! These PDF files were 
harvested from the Internet Archive. They seem to be viewable just fine for 
desktop machines, but not tablets.

The number of files I have in the collection is finite and small -- less than 
100. Does anybody have any suggestions on how I could convert or modify the 
files to make them more useful? I would be willing to manually do the 
modifications if necessary.

-- 
Eric Morgan
University of Notre Dame


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Eric Lease Morgan
On Oct 3, 2011, at 11:12 AM, Dave Caroline wrote:

 Diva was announced here of 6th of June
 https://listserv.nd.edu/cgi-bin/wa?A2=ind1106L=CODE4LIBT=0F=S=P=27064
 
 The clever part is you only send the visible part at the scale they
 are viewing so little excess bandwidth.
 
 For online document view it takes some beating and is not too hard to set up
 My demo is running on an adsl line from home, probably a worst case speed 
 demo.
 
   http://ddmal.music.mcgill.ca/diva
   http://ddmal.music.mcgill.ca/diva/demo


Very interesting, and thank you for bringing it to my attention. It seems it 
relies on a technology that reads and chunks up image files. Alas, I have PDFs. 
Moreover, I really want people to be able to print the entire documents. I 
suppose I could convert my PDF files into images and go that route. Hmm…

-- 
Eric Morgan


Re: [CODE4LIB] opening a pdf file

2011-10-03 Thread Demian Katz
I imagine the problems have to do with the multi-layered nature of the Internet 
Archive scans; even on my PC, the PDF doesn't render well until it is fully 
downloaded -- while it is in the process of loading, I see a blurry mess.  
Perhaps the tablet devices are doing the same thing -- attempting to display 
content prior to the full load of the file but not entirely succeeding.

It may be possible to flatten the layers of the PDFs in some way, but I'm not 
sure exactly how.

Additionally, I notice that there are different versions of the PDF here:

http://www.archive.org/details/canarybird00schm

(one labeled PDF, another B/W PDF)

Does one version work better on tablets than the other?

- Demian

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease 
Morgan [emor...@nd.edu]
Sent: Monday, October 03, 2011 11:26 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] opening a pdf file

Most people seem to have mixed results when trying to open the PDF files on 
their tablet-based (Android and iOS) devices. Bummer! These PDF files were 
harvested from the Internet Archive. They seem to be viewable just fine for 
desktop machines, but not tablets.

The number of files I have in the collection is finite and small -- less than 
100. Does anybody have any suggestions on how I could convert or modify the 
files to make them more useful? I would be willing to manually do the 
modifications if necessary.

--
Eric Morgan
University of Notre Dame


Re: [CODE4LIB] opening a pdf file

2011-10-03 Thread Eric Lease Morgan
On Oct 3, 2011, at 11:32 AM, Demian Katz wrote:

 Additionally, I notice that there are different versions of the PDF here:
 
 http://www.archive.org/details/canarybird00schm
 
 (one labeled PDF, another B/W PDF)
 
 Does one version work better on tablets than the other?


At first glance, the black  white versions do not render on my iPad either. 
Hurumph!

-- 
Eric Getting Desperate Morgan


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Andrew Hankinson
On 2011-10-03, at 11:29 AM, Eric Lease Morgan wrote:
 
 Very interesting, and thank you for bringing it to my attention. It seems it 
 relies on a technology that reads and chunks up image files. Alas, I have 
 PDFs. Moreover, I really want people to be able to print the entire 
 documents. I suppose I could convert my PDF files into images and go that 
 route. Hmm…

I'm one of the developers of Diva. I noticed that you've been getting your 
files from the Internet Archive. They also have the full high-quality JPEG and 
JPEG2000 images available.

http://ia600209.us.archive.org/6/items/acourseofreligio00gerauoft/

You could use those for Diva instead of the already-compressed PDF.

Printing could still be handled by downloading the PDF, but if you just want to 
be able to view it online then I'd be happy to help you get Diva set up.

Note that we also have an article in the latest C4L journal describing how it 
works:http://journal.code4lib.org/articles/5418

Cheers!
-Andrew

 
 -- 
 Eric Morgan


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Parker, Anson (adp6j)
So this is awesome, does it in fact work with PDF's or not, and if not
does anyone have any similar tools recommended for pdfs
ap


On 10/3/11 11:12 AM, Dave Caroline dave.thearchiv...@gmail.com wrote:

Diva was announced here of 6th of June
https://listserv.nd.edu/cgi-bin/wa?A2=ind1106L=CODE4LIBT=0F=S=P=27064

The clever part is you only send the visible part at the scale they
are viewing so little excess bandwidth.

For online document view it takes some beating and is not too hard to set
up
My demo is running on an adsl line from home, probably a worst case speed
demo.

Site is

http://ddmal.music.mcgill.ca/diva

real demo
http://ddmal.music.mcgill.ca/diva/demo

Dave Caroline

On Mon, Oct 3, 2011 at 3:36 PM, Eric Lease Morgan emor...@nd.edu wrote:
 On Oct 3, 2011, at 10:26 AM, Dave Caroline wrote:

 It is educational to look at memory use in the pc when that pdf is
loaded.
 Evince here is using 600meg do you have space for such objects on
 these little toys

 try something like diva so you dont suck the resources dry on the
client

 Please tell me (us) more about diva. I am not familiar with it.  --Eric
Morgan



Re: [CODE4LIB] screen scraping

2011-10-03 Thread Simon Spero
On Oct 3, 2011 9:19 AM, Ed Summers e...@pobox.com wrote:

 On Sun, Oct 2, 2011 at 10:32 PM, Ken Irwin kir...@wittenberg.edu wrote:
  1. respect robots.txt

Disclaimer: I am not a lawyer.

Remember that robots.txt applies only to recursive web crawlers, and not to
screen-scraping per se. In cases where it does apply, it has limited legal
effect, but ignoring it is not cricket.

Important considerations are: is access to the site governed by a license
that prohibits the activity; is the content being scraped subject to
copyright, and if so, is the screen scraping covered by one of the
exceptions to exclusive rights of the copyright holder; is the
screen-scraping activity disruptive and damaging to the site being used
(trespass to chattels, etc.)?

A bit of reflection on the Golden Rule probably is probably more important
than pondering the legality of what you are doing.

Ed invoking philosophy? With citation? (wikipedia still counts) :-p

The usual objection to the golden rule apply here- just because one has no
objection to having a screen scraper used on your own site doesn't
automatically imply that others might not wish to have their sites scraped.

Simon


Re: [CODE4LIB] screen scraping

2011-10-03 Thread Nate Vack
On Sun, Oct 2, 2011 at 9:35 PM, Reese, Terry
terry.re...@oregonstate.edu wrote:
 In Canada, the BC Supreme Court ruled that screen scrapping real estate 
 listings from one site and using them on another indeed infringed on 
 copyright.  Not sure if this would cover your use -- but if you are coming 
 from Canada, it might be something to consider.

 Decision URL: 
 http://www.canlii.org/en/bc/bcsc/doc/2011/2011bcsc1196/2011bcsc1196.html

If you read the decision, it looks as though the content found to be
infringing was the property's description and photograph, which are
creative works.

Indexing factual data about a property *only* (asking price, address,
square footage, etc) may have been on stronger legal footing.

Regards,
-Nate


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Tom Keays
On Mon, Oct 3, 2011 at 11:57 AM, Andrew Hankinson 
andrew.hankin...@gmail.com wrote:

 I'm one of the developers of Diva. I noticed that you've been getting your
 files from the Internet Archive. They also have the full high-quality JPEG
 and JPEG2000 images available.

 http://ia600209.us.archive.org/6/items/acourseofreligio00gerauoft/

 You could use those for Diva instead of the already-compressed PDF.


While I agree that Diva offers a really good on-screen reading experience
(probably the best I've used so far), Archive.org itself offers a good one
too.

So, for the first book in Eric's list,
http://www.archive.org/details/acourseofreligio00gerauoft
the on-screen reader is at
http://www.archive.org/stream/acourseofreligio00gerauoft

I tried it out in my 3 year old, 2nd generation iPod Touch over the flakey
campus WiFi and found that it displayed quite nicely. You have paging
controls, but can also use touch gestures to scroll and pinch the page
larger. Like Diva, it uses lazy loading techniques, so you don't have to
wait until the whole document is available to start reading.

Tom


[CODE4LIB] OpenLibrary book covers privacy concerns

2011-10-03 Thread Erin Germ
Does anyone know if OpenLibrary tracks specific user information on the book
covers they provide? My concern is privacy of the patrons, not stat use?

Thanks.

~Erin


Re: [CODE4LIB] OpenLibrary book covers privacy concerns

2011-10-03 Thread Chris Cormack
On 4 Oct 2011 07:27, Erin Germ erinlovestec...@gmail.com wrote:

 Does anyone know if OpenLibrary tracks specific user information on the
book
 covers they provide? My concern is privacy of the patrons, not stat use?


Why not ask them? I've found them nothing but exceedingly helpful

Chris

 Thanks.

 ~Erin


Re: [CODE4LIB] ny times best seller api

2011-10-03 Thread Karen Coombs
Some of OCLC's APIs do support JSONP or CORS: for example
QuestionPoint API, the xIdentifier and MapFAST services. However,
other services do not provide this support. This is because for these
services we need to carefully ensure that the application making the
request is actually owned by the institution/user to which the key has
been issued. If we don't do this then there are several consequences

1. We aren't able to clearly distinguish who is using a given service.
This fundamentally makes it more difficult for us to keep statistics
and evaluate the value of our services to the OCLC membership.
2. Users/Institutions not eligible to use particular services are able
to access those services.
3. Other users/institutions may be able to see data which is private
and specific to a particular institution

We know that developers want to use our APIs in Javascripts. As a
result we're working really hard on potential solutions that would
allow us to both provide this type of access and ensure that the
application making the request of the API is coming from an
appropriate authorized institution/user.

If anyone has further questions, they are welcome to email me directly.

Karen

Karen Coombs
Product Manager
OCLC Developer Network
coom...@oclc.org


On Mon, Oct 3, 2011 at 8:21 AM, Ed Summers e...@pobox.com wrote:
 On Wed, Sep 28, 2011 at 5:36 PM, Godmar Back god...@gmail.com wrote:
 Closer to the code4lib community: OCLC and Serials Solutions don't support
 JSONP in their webservices, either, even though doing so would allow cool
 services and would likely not affect their business models adversely in a
 significant way, IMO. We should keep lobbying them to remove these
 restrictions, as I've been doing for a while.

 I agree. I'm not sure how pervasive it is at OCLC, but their MapFast
 Service supports Cross Origin Resource Sharing (CORS) [1,2], which
 means that JSONP isn't needed for modern browsers. Basically it's just
 adding the following header to the JSON response:

    Access-Control-Allow-Origin: *

 Something to think about when creating a web service for others, at any rate.

 [1] http://en.wikipedia.org/wiki/Cross-Origin_Resource_Sharing
 [2] http://inkdroid.org/journal/2011/02/09/oclcs-mapfast-and-cors/



Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Andrew Hankinson
It doesn't work with PDFs, since it needs to create a tiled TIFF image for each 
page.

I don't know of anything similar for PDFs, since they're not really designed to 
render a portion of the document without downloading the entire thing.

You can convert PDF pages to images, though... :)

-Andrew

On 2011-10-03, at 12:09 PM, Parker, Anson (adp6j) wrote:

 So this is awesome, does it in fact work with PDF's or not, and if not
 does anyone have any similar tools recommended for pdfs
 ap
 
 
 On 10/3/11 11:12 AM, Dave Caroline dave.thearchiv...@gmail.com wrote:
 
 Diva was announced here of 6th of June
 https://listserv.nd.edu/cgi-bin/wa?A2=ind1106L=CODE4LIBT=0F=S=P=27064
 
 The clever part is you only send the visible part at the scale they
 are viewing so little excess bandwidth.
 
 For online document view it takes some beating and is not too hard to set
 up
 My demo is running on an adsl line from home, probably a worst case speed
 demo.
 
 Site is
 
 http://ddmal.music.mcgill.ca/diva
 
 real demo
 http://ddmal.music.mcgill.ca/diva/demo
 
 Dave Caroline
 
 On Mon, Oct 3, 2011 at 3:36 PM, Eric Lease Morgan emor...@nd.edu wrote:
 On Oct 3, 2011, at 10:26 AM, Dave Caroline wrote:
 
 It is educational to look at memory use in the pc when that pdf is
 loaded.
 Evince here is using 600meg do you have space for such objects on
 these little toys
 
 try something like diva so you dont suck the resources dry on the
 client
 
 Please tell me (us) more about diva. I am not familiar with it.  --Eric
 Morgan
 


Re: [CODE4LIB] opening a pdf file

2011-10-03 Thread Dave Caroline
The problem is PDF and the viewers. some/most expand ALL the
compressed images and create thumbs from the images before they start
display, This uses huge amounts of memory, a technology fail, they
just dont fit certain work.
If you are lucky the viewer keeps it compressed so it fits in memory
and only uncompresses a single image that is being displayed, this
stops niceties like thumbs unless a background thread goes through the
images and caches the thumbs (slow), this is where even a tiled tiff
viewer could be a lot better.

Someone needs to do a tiled compressed format along with a viewer and
banish PDF to the other side of the moon.

Dave Caroline


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Brenner, Aaron L
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Andrew Hankinson

 I don't know of anything similar for PDFs, since they're not really
 designed to render a portion of the document without downloading the
 entire thing.

The linearized form of PDF is designed to do just that:  permit a PDF viewer 
to display the first page while still downloading the rest of the document.  

The pdfopt utility, which I believe installs with ghostscript, will do this 
from the command line.  Alternatively you can save a PDF with fast web view 
enabled in Acrobat Professional.  There are surely other ways to create a 
linearized PDF, but those are the methods I've used.

-AB
--
Head, Digital Research Library
University Library System
University of Pittsburgh
7500 Thomas Blvd., Room 306
Pittsburgh, PA 15260

Phone:   412.244.7526
Fax: 412.244.7537
  


Re: [CODE4LIB] screen scraping

2011-10-03 Thread Genny Engel
Another reason to check with the webmaster, all legalities aside, is that their 
top ten list might actually be being built on an RSS feed, but for whatever 
reason they don't offer it directly as a feed (or they do, but it wasn't 
obvious to you where that feed was to be found).  They might prefer you grab 
the feed rather than scrape the screen.  I don't actually have any feed-based 
pages on our site that aren't also available as feeds -- but some people might. 
 Also, for usage statistics reasons, I'd rather have bots hitting the feeds 
instead of the pages.

Genny Engel
Sonoma County Library
gen...@sonoma.lib.ca.us
707 545-0831 x581
www.sonomalibrary.org


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nate 
Hill
Sent: Sunday, October 02, 2011 7:23 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] screen scraping

A question: what are the 'rules' around screen scraping?
If one site doesn't offer an RSS feed and you want to grab (for example)
their weekly top ten list with a script and then redisplay it on another
site, is that bad form?  Or even illegal?
Thanks-
Nate


-- 
Nate Hill
nathanielh...@gmail.com
http://www.natehill.net


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Tom Keays
Another idea, if you are looking for an app-based rather than web-based
reader is VuDroid, which supports both PDF and DjVu formats.
http://code.google.com/p/vudroid/

I suggest it, not because I use it but because, at least in the Open Library
version of the book's record,
http://openlibrary.org/books/OL7169556M/
DjVu is listed as a streaming format. If I had an Android, I would give it a
try.

For iOS, there's DjVu reader that seems pretty decent.
http://xzonesoftware.com/products/xdjvu
I may check it out later tonight.

Tom


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread KREYCHE, MICHAEL
I just installed EBookDroid (AFAICT the latest and greatest version of VuDroid) 
from the Android Market and tried it on the color version of Eric's canary 
file. It immediately loads numbered blank pages, then starts rendering the 
current page, which takes about 20 seconds. Previously rendered pages disappear 
pretty quickly after scrolling off of them.

I works pretty snappily on other files and looks like it could turn out to be 
my PDF reader of choice.

Mike

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tom 
Keays
Sent: Monday, October 03, 2011 3:33 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] opening a pdf file [diva]

Another idea, if you are looking for an app-based rather than web-based
reader is VuDroid, which supports both PDF and DjVu formats.
http://code.google.com/p/vudroid/

I suggest it, not because I use it but because, at least in the Open Library
version of the book's record,
http://openlibrary.org/books/OL7169556M/
DjVu is listed as a streaming format. If I had an Android, I would give it a
try.

For iOS, there's DjVu reader that seems pretty decent.
http://xzonesoftware.com/products/xdjvu
I may check it out later tonight.

Tom