Re: [CODE4LIB] New Release: Diva.js 4.0 (with IIIF support)

2015-09-09 Thread Andrew Hankinson
Thanks, John. I've fixed it now. Can't have a major release announcement 
without *something* going wrong. ;)

-Andrew

> On Sep 9, 2015, at 1:16 PM, Scancella, John  wrote:
> 
> Andrew,
> 
> I am getting this error when trying out the default on the website(see 
> attached)
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU 
> <mailto:CODE4LIB@LISTSERV.ND.EDU>] On Behalf Of Andrew Hankinson
> Sent: Wednesday, September 09, 2015 4:54 AM
> To: CODE4LIB@LISTSERV.ND.EDU <mailto:CODE4LIB@listserv.nd.edu>
> Subject: [CODE4LIB] New Release: Diva.js 4.0 (with IIIF support)
> 
> We’re pleased to announce a new version of our open-source document image 
> viewer, Diva.js. Diva is an ideal for archival book digitization initiatives 
> where viewing high-resolution images is a crucial part of the user 
> experience. Using Diva, libraries, archives, and museums can present 
> high-resolution document page images in a user-friendly “instant-on” 
> interface that has been optimized for speed and flexibility.
> 
> In version 4.0 we’re introducing support for the International Image 
> Interoperability Framework (IIIF). Through IIIF, Diva becomes part of a 
> larger movement to enhance archival image collections through promoting 
> sharing of these resources.
> 
> With 4.0 we’re also introducing the “Book Layout” view, presenting document 
> images as openings, or facing pages. This will provide our users with a 
> valuable way of visualizing document openings, providing more tools for 
> viewing and understanding the structure of a digitized document.
> 
> Several demos are available at http://ddmal.github.io/diva.js/try/ 
> <http://ddmal.github.io/diva.js/try/> <http://ddmal.github.io/diva.js/try/ 
> <http://ddmal.github.io/diva.js/try/>>
> 
> Other improvements in 4.0 include:
>   • Improved integration with existing web applications
>   • New plugins: Autoscroll (animated page scrolling), Page Alias (pages 
> may have multiple identifiers), IIIF Metadata (displays document metadata 
> from IIIF manifest), IIIF Highlight (displays annotations from a IIIF 
> manifest)
>   • Improved build system with Gulp
>   • Support for switching documents without reloading the viewer
>   • Numerous bug fixes and optimizations
> 
> For more information, demos, and documentation visit 
> http://ddmal.github.io/diva.js/.
> 
> Diva.js is developed by the Distributed Digital Music Archives and Libraries 
> laboratory, part of the Music Technology Area of the Schulich School of Music 
> at McGill University and is funded by the Social Sciences and Humanities 
> Research Council of Canada.
> 
> 


[CODE4LIB] New Release: Diva.js 4.0 (with IIIF support)

2015-09-09 Thread Andrew Hankinson
We’re pleased to announce a new version of our open-source document image 
viewer, Diva.js. Diva is an ideal for archival book digitization initiatives 
where viewing high-resolution images is a crucial part of the user experience. 
Using Diva, libraries, archives, and museums can present high-resolution 
document page images in a user-friendly “instant-on” interface that has been 
optimized for speed and flexibility.

In version 4.0 we’re introducing support for the International Image 
Interoperability Framework (IIIF). Through IIIF, Diva becomes part of a larger 
movement to enhance archival image collections through promoting sharing of 
these resources.

With 4.0 we’re also introducing the “Book Layout” view, presenting document 
images as openings, or facing pages. This will provide our users with a 
valuable way of visualizing document openings, providing more tools for viewing 
and understanding the structure of a digitized document.

Several demos are available at http://ddmal.github.io/diva.js/try/ 


Other improvements in 4.0 include:
• Improved integration with existing web applications
• New plugins: Autoscroll (animated page scrolling), Page Alias (pages 
may have multiple identifiers), IIIF Metadata (displays document metadata from 
IIIF manifest), IIIF Highlight (displays annotations from a IIIF manifest)
• Improved build system with Gulp
• Support for switching documents without reloading the viewer
• Numerous bug fixes and optimizations

For more information, demos, and documentation visit 
http://ddmal.github.io/diva.js/.

Diva.js is developed by the Distributed Digital Music Archives and Libraries 
laboratory, part of the Music Technology Area of the Schulich School of Music 
at McGill University and is funded by the Social Sciences and Humanities 
Research Council of Canada.


Re: [CODE4LIB] Diva.js 3.0: High-resolution document image viewer

2014-09-24 Thread Andrew Hankinson
Hi Todd,

We’ve got someone working on it as we speak. ;)

https://github.com/DDMAL/diva.js/issues/136

-Andrew

On Sep 24, 2014, at 11:23 AM, todd.d.robb...@gmail.com 
 wrote:

> Solid work Andrew and team!
> 
> Is there IIIF <http://iiif.io/> integration already or is that on the
> roadmap?
> 
> 
> Cheers!
> 
> 
> 
> On Wed, Sep 24, 2014 at 7:26 AM, Andrew Hankinson <
> andrew.hankin...@gmail.com> wrote:
> 
>> We’re pleased to announce a new version of our open-source document image
>> viewer, Diva.js. Diva.js is especially suited for use in rare and archival
>> book digitization initiatives where viewing high-resolution images can show
>> even the smallest detail present on the physical object. Using Diva,
>> libraries, archives, and museums can present high-resolution document page
>> images in an “instant-on” interface with a user-friendly interface that has
>> been optimized for speed and flexibility.
>> 
>> New features in Diva.js 3.0:
>> 
>> • Several speed optimizations – Documents load and scroll faster.
>> • In-browser (JavaScript) image manipulation – Adjust page brightness,
>> contrast, and rotation.
>> • Improved mobile device support – Tap and pinch to navigate through
>> documents.
>> • Horizontal orientation – Switch between the default vertical page
>> layout and a horizontal scrolling layout.
>> • Events system – Allows you to pass streaming data from the document
>> viewer into your own website and plugins.
>> • Improved and updated documentation:
>> https://github.com/DDMAL/diva.js/wiki.
>> • A new website.
>> • Numerous of bug fixes.
>> 
>> For more information, demos, and documentation visit
>> http://ddmal.github.io/diva.js/.
>> 
>> Diva.js is developed by the Distributed Digital Music Archives and
>> Libraries laboratory, part of the Music Technology Area of the Schulich
>> School of Music at McGill University.
>> 
> 
> 
> 
> -- 
> Tod Robbins
> Digital Asset Manager, MLIS
> todrobbins.com | @todrobbins <http://www.twitter.com/#!/todrobbins>


[CODE4LIB] Diva.js 3.0: High-resolution document image viewer

2014-09-24 Thread Andrew Hankinson
We’re pleased to announce a new version of our open-source document image 
viewer, Diva.js. Diva.js is especially suited for use in rare and archival book 
digitization initiatives where viewing high-resolution images can show even the 
smallest detail present on the physical object. Using Diva, libraries, 
archives, and museums can present high-resolution document page images in an 
“instant-on” interface with a user-friendly interface that has been optimized 
for speed and flexibility.

New features in Diva.js 3.0:

 • Several speed optimizations – Documents load and scroll faster.
 • In-browser (JavaScript) image manipulation – Adjust page brightness, 
contrast, and rotation.
 • Improved mobile device support – Tap and pinch to navigate through documents.
 • Horizontal orientation – Switch between the default vertical page layout and 
a horizontal scrolling layout.
 • Events system – Allows you to pass streaming data from the document viewer 
into your own website and plugins.
 • Improved and updated documentation: https://github.com/DDMAL/diva.js/wiki.
 • A new website.
 • Numerous of bug fixes.

For more information, demos, and documentation visit 
http://ddmal.github.io/diva.js/.

Diva.js is developed by the Distributed Digital Music Archives and Libraries 
laboratory, part of the Music Technology Area of the Schulich School of Music 
at McGill University.


Re: [CODE4LIB] Baggit specification question

2014-08-06 Thread Andrew Hankinson
I suspect the first example you give is correct. The newline character is the 
field delimiter. If you’re reading this into a structured representation (e.g., 
a Python dictionary) you could parse the presence of nothing between the colon 
and the newline as “None”, but in a text file there is no representation of 
“nothing” except for actually having nothing.


On Aug 6, 2014, at 7:11 PM, Rosalyn Metz  wrote:

> Hi all,
> 
> Below is a question one of my colleagues posted to digital curation, but
> I'm also posting here (because the more info the merrier).
> 
> Thanks!
> Rosy
> 
> 
> 
> Hi,
> 
> I am working for the Digital Preservation Network and our current
> specification requires that we use baggit bags.
> 
> Our current spec for bag-info.txt reads in part:
> 
> "DPN requires the presence of the following fields, although they may be
> empty.  Please note that the values of "null" and/or "nil" should not be
> used.  The colon (:) should still be present."
> 
> 
> From my reading of the baggit spec, section 2.2.2:
> 
> "A metadata element MUST consist of a label, a colon, and a value, each
> separated by optional whitespace. "
> 
> 
> 2.2.2 is for the bag-info.txt, but it seems that this is the general rule.
> 
> Question: Are values required for all? Which below is correct or both? Ex:
> 
> Source-Organization:
> 
> or
> 
> 
> Source-Organization: nil
> 
> 
> I appreciate any clarification,
> 
> Thanks
> James
> Stanford Digital Repository


Re: [CODE4LIB] Python CMSs

2014-02-13 Thread Andrew Hankinson
I have a small anecdote on my experience with Drupal, Django, and custom 
development.

I was writing a site that required a number of custom content types, some of 
them fairly complex, and a Solr back-end for full-text and faceted search. I 
had developed a number of Drupal sites up to that point, but this was probably 
the most complex one.

I tore my hair out for a month or two, trying to get all of the different 
Drupal modules to talk to each other, and writing lots of glue code to go 
between the custom modules using the (sometimes undocumented) hooks for each 
module. 

One day I became so frustrated that I decided that I would give myself 24 hours 
to re-do the site in Django. If I could get the Django site up to par with the 
Drupal site in that amount of time, I would move forward with Django. 
Otherwise, I would keep going with Drupal. Up to that point, I had done the 
Django tutorial a couple times, and implemented a few test sites, but not much 
else.

Within 24 hours I had re-implemented the content type models, hooked up the 
Solr search, worked out a few of the templates, and was well on my way to 
actually making progress with the site. More than that, I was enjoying the 
coding rather than staring in frustration at hooks and wondering why something 
wasn’t getting called when it should be.

Since then I haven’t touched Drupal.

Cheers,
-Andrew

On Feb 13, 2014, at 9:59 PM, Riley Childs  wrote:

> WordPress is easy for content creators, but don't let the blog part fool you, 
> it is a fully developed framework that is easy to develop for, it is intended 
> to make it easy to get started, but from base upward it is 100% custom. I 
> don't know what your particular needs are, but I would give WP a serious 
> look! Plus WP integrates well with any web app you could shake a stick at. In 
> summary chose a CMS that fits YOUR needs, my rants are what made WP a good 
> fit for me, yours are different so make a decision based on what YOU need, 
> not my needs!
> 
> Riley Childs
> Student
> Asst. Head of IT Services
> Charlotte United Christian Academy
> (704) 497-2086
> RileyChilds.net
> Sent from my Windows Phone, please excuse mistakes
> 
> From: Daron Dierkes
> Sent: ‎2/‎13/‎2014 9:52 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Python CMSs
> 
> If you're new to python and django there will be a steep learning curve for
> you, but probably a much steeper one for people after you who may not do
> python at all.  Drupal and Wordpress are limited, but non-technical
> librarians can still get in pretty easy to fix typos and add links at
> least..  Codecademy has a decent intro python course:
> http://www.codecademy.com/tracks/python
> Udemy has a few python courses with some django as well.
> 
> A big reason why I've been learning django is to try to understand how our
> library can work with the various DH projects that use our collections. If
> we need to at some point take on permanent ownership of these projects or
> if we want to develop them further, a basic familiarity on the part of our
> library staff seems like a good idea.


Re: [CODE4LIB] mass convert jpeg to pdf

2013-11-12 Thread Andrew Hankinson
Just thought I might plug some software we're developing to solve the book 
image navigation "misery" that Kyle mentions.

http://ddmal.music.mcgill.ca/diva/

and a demo:

http://ddmal.music.mcgill.ca/newdiva/demo/single.html

We developed it because we were frustrated with the "image gallery" paradigm 
for book image viewing, and wanted something more like Google Books' viewer, 
but with access to the highest resolution possible. We also were frustrated 
with having to download large PDFs to just view a couple pages.

Diva uses IIP on the back-end to serve out image tiles, so you're only ever 
downloading the part of the image that's viewable -- the rest is auto-loaded as 
the user scrolls. 

We've used it to display a manuscript that's ~80GB (total), with each image 
around 200MB.

http://coltrane.music.mcgill.ca/salzinnes/experiments/diva-cci-tif/

It's also got a couple other neat features, like in-browser 
brightness/contrast/rotation adjustments via canvas. (Click the little gear 
icon in the top left of each page image).

Cheers,
-Andrew

On 2013-11-08, at 4:22 PM, Kyle Banerjee  wrote:

>> It is sad to me that converting to PDF for viewing off the Web seems like
>> the answer. Isn’t there a tiling viewer (like Leaflet) that could be used
>> to render jpeg derivatives of the original tif files in Omeka?
>> 
>> 
> This should be pretty easy. But the issue with tiling is that the nav
> process is miserable for all but the shortest books. Most of the people who
> want to download want are looking for jpegs rather than source tiffs and
> one pdf instead of a bunch of tiffs (which is good since each one is
> typically over 100MB). Of course there are people who want the real deal,
> but that's actually a much less common use case.
> 
> As Karen observes, downloading and viewing serve different use cases so of
> course we will provide both. IIP Image Server looks intriguing. But most of
> our users who want the full res stuff really just want to download the
> source tiffs which will be made available.
> 
> kyle


Re: [CODE4LIB] Loris

2013-11-08 Thread Andrew Hankinson
So what’s the difference between IIIF and IIP? (the protocol, not the server 
implementation)

-Andrew

On Nov 8, 2013, at 9:05 PM, Jon Stroop  wrote:

> It aims to do the same thing...serve big JP2s (and other images) over the 
> web, so from that perspective, yes. But, beyond that, time will tell. One 
> nice thing about coding against a well-thought-out spec is that are lots of 
> implementations from which you can choose[1]--though as far as I know Loris 
> is the only one that supports the IIIF syntax natively (maybe IIP?). We still 
> have Djatoka floating around in a few places here, but, as many people have 
> noted over the years, it takes a lot of shimming to scale it up, and, as far 
> as I know, the project has more or less been abandoned.
> 
> I haven't done too much in the way of benchmarking, but to date don't have 
> any reason to think Loris can't perform just as well. The demo I sent earlier 
> is working against a very large jp2 with small tiles[1] which means a lot of 
> rapid hits on the server, and between that, (a little bit of) JMeter and ab 
> testing, and a fair bit of concurrent use from the c4l community this 
> afternoon, I feel fairly confident about it being able to perform as well as 
> Djatoka in a production environment.
> 
> By the way, you can page through some other images here: 
> http://libimages.princeton.edu/osd-demo/
> 
> Not much of an answer, I realize, but, as I said, time and usage will tell.
> 
> -Js
> 
> 1. http://iiif.io/apps-demos.html
> 2. 
> http://libimages.princeton.edu/loris/pudl0052%2F6131707%2F0001.jp2/info.json
> 
> 
> On 11/8/13 8:07 PM, Peter Murray wrote:
>> A clarifying question: is Loris effectively a Python-based replacement for 
>> the Java-based djatoka [1] server?
>> 
>> 
>> Peter
>> 
>> [1] http://sourceforge.net/apps/mediawiki/djatoka/index.php?title=Main_Page
>> 
>> 
>> On Nov 8, 2013, at 3:05 PM, Jon Stroop  wrote:
>> 
>>> c4l,
>>> I was reminded earlier this week at DLF (and a few minutes ago by Tom
>>> and Simeon) that I hadn't ever announced a project I've been working for
>>> the least year or so to this list. I showed an early version in a
>>> lightning talk at code4libcon last year.
>>> 
>>> Meet Loris: https://github.com/pulibrary/loris
>>> 
>>> Loris is a Python based image server that implements the IIIF Image API
>>> version 1.1 level 2[1].
>>> 
>>> http://www-sul.stanford.edu/iiif/image-api/1.1/
>>> 
>>> It can take JP2 (if you make Kakadu available to it), TIFF, or JPEG
>>> source images, and hand back JPEG, PNG, TIF, and GIF (why not...).
>>> 
>>> Here's a demo of the server directly: http://goo.gl/8XEmjp
>>> 
>>> And here's a sample of the server backing OpenSeadragon[2]:
>>> http://goo.gl/Gks6lR
>>> 
>>> -Js
>>> 
>>> 1. http://www-sul.stanford.edu/iiif/image-api/1.1/
>>> 2. http://openseadragon.github.io/
>>> 
>>> -- 
>>> Jon Stroop
>>> Digital Initiatives Programmer/Analyst
>>> Princeton University Library
>>> jstr...@princeton.edu
>> --
>> Peter Murray
>> Assistant Director, Technology Services Development
>> LYRASIS
>> peter.mur...@lyrasis.org
>> +1 678-235-2955
>> 800.999.8558 x2955


Re: [CODE4LIB] tiff2pdf, then back to pdf?

2013-04-27 Thread Andrew Hankinson
As someone who works on document recognition, I have to disagree. You should 
always keep an uncompressed original around, since you can never recover it 
without (often expensive) re-imaging. JPEG, or any other type of lossy 
compression, introduces artifacts that don't look "too bad" by the human eye, 
but have a significant effect on the quality of OCR. You can never recover this 
after you have discarded your originals.

Big files are clunky to work with, which is why you should have an automated 
way of producing surrogate, compressed copies for general use, but like any 
archivist will tell you, a photocopy is not a replacement for the original.

-Andrew

On 2013-04-27, at 7:17 PM, Wilhelmina Randtke  wrote:

> Yes, exactly.  You will loose some of the image quality.  If you change to
> a compressed format, then back to the TIFF, you can get the format, but you
> can't go back to the original file.
> 
> Stop and think:  What are your long term goals?
> 
> Big files are clunky to work with.  I'm guessing that's why you don't want
> TIFF.  In my experience, files big enough to be clunky are discarded within
> a few years, regardless of the intentions when they were prepped.  If you
> want to avoid big files, then your best bet is to assess and test the file
> you will actually keep and do the best job you can with it.  So, if you
> want to rerun OCR in a few years when the recognition will be better, then
> make your PDFs in such a way that you can get decent OCR out of them today,
> and plan to rerun on those files, not the (discarded) originals.  Don't
> think reformatting will get you any better image quality later.
> 
> -Wilhelmina Randtke
> 
> On Fri, Apr 26, 2013 at 3:19 PM, James Gilbert 
> wrote:
> 
>> I'm by no means an expert in the math behind image format conversions...
>> but:
>> 
>> When converting to TIFF-to-JPG, TIFF is uncompressed formatting and JPG is
>> compressed format.
>> 
>> When back converting, wouldn't the original quality of TIFF would be lost,
>> converted only to the quality of the last JPG (with degradation on each
>> time
>> this process occurs)?
>> 
>> James Gilbert, BS, MLIS
>> Systems Librarian
>> Whitehall Township Public Library
>> 3700 Mechanicsville Road
>> Whitehall, PA 18052
>> 610-432-4339 ext: 203
>> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>> Roy
>> Sent: Friday, April 26, 2013 4:15 PM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] tiff2pdf, then back to pdf?
>> 
>> If you can stand an extrastep, Ed, there are tools to convert PDF to jpg
>> images, and from there it shouldn't be too hard to get TIFF output. Do a
>> search for "convert PDF to image" to get started. There are tools that are
>> not online only, which I'm pretty sure is what you're after.
>> 
>> Roy Zimmer
>> Western Michigan University
>> 
>> 
>> On 4/26/2013 4:08 PM, Edward M. Corrado wrote:
>>> Hi All,
>>> 
>>> I have a need to batch convert many TIFF images to PDF. I'd then like
>>> to be able to discard the TIFF images, but I can only do that if I can
>>> create the original TIFF again from the PDF. Is this possible? If so,
>>> using what tools and how?
>>> 
>>> tiff2pdf seems like a possible solution, but I can't find a
>>> corresponding "pdf2tif" program that reverses the process.
>>> 
>>> Any ideas?
>>> 
>>> Edward
>> 


Re: [CODE4LIB] Displaying archival books on ipad and android tablets

2013-02-25 Thread Andrew Hankinson
I would be interested in seeing your customizations. I've tried getting 
BookReader installed a couple times, and each time I got fed up with the 
install instructions, since it seemed specially tailored to the IA 
infrastructure. They mention that "others" have managed to get Djatoka working 
with BookReader, but I've scoured the Google and couldn't seem to find anyone 
who would share their code to get this working.


On 2013-02-25, at 9:01 AM, Shaun Ellis  wrote:

> Kyle,
> We have lots of old books too, and use the Open Library BookReader [1] for 
> viewing.  It's been designed with the iPad and other tablets in mind.  I have 
> customized it to work with Djatoka, allowing us "deep zoom" and other 
> niceties of using JPEG2000 . However, out of the box, you can follow the 
> Internet Archive's recipe [3] of zipping up a variety of derivative sizes, 
> which works nicely as well.  It's pretty easy to set up.
> 
> I should mention that I met a number of folks at the conference who are using 
> the BookReader and interested in extending/adapting it in a sustainable and 
> cooperative way, with recent projects like the IIIF Image API and 
> OpenAnnotation integration in mind.  Let us know if anyone else is interested 
> in being part of that discussion and development.  We haven't put together a 
> separate mailing list or anything yet, but probably will get one together 
> soon.
> 
> [1] http://openlibrary.org/dev/docs/bookreader
> [2] http://pudl.princeton.edu/objects/ms35t871w
> [3] 
> http://raj.blog.archive.org/2011/03/17/how-to-serve-ia-style-books-from-your-own-cluster/
> 
> -Shaun
> 
> On 2/22/13 7:50 PM, Kyle Banerjee wrote:
>> We have a few digitized books, (some of them are old -- we're talking 500
>> years). Sizes are all over the place but the big ones are easily the size
>> of a large briefcase.
>> 
>> We want to make these works more accessible/usable and there's some demand
>> to make them available for tablets. What experience do people have with
>> stuff like that, and what software/services/methods do you recommend?
>> 
>> Source files are 600 dpi uncompressed tiffs so they're pretty big and
>> there's nothing special about a book being over 10GB in size. Thanks,
>> 
>> kyle


Re: [CODE4LIB] Displaying archival books on ipad and android tablets

2013-02-23 Thread Andrew Hankinson
Hi Kyle,

You might want to have a look at our Diva viewer 
(http://ddmal.music.mcgill.ca/diva/).

We've tested it on books that are over 100GB total, and images that are around 
200MB each. For example:

http://coltrane.music.mcgill.ca/salzinnes/experiments/diva-cci-tif/

Each page is about 180MB each (uncompressed TIFF).

Here's some features:

-- Supports JPEG2000 and Pyramid TIFF via the IIP Image Server.
-- Almost immediate viewing. You're only downloading the parts of the page that 
you're seeing, so even if a book is huge, you won't download it until you need 
it. So you don't need to try to do tons of compression or conversion to 
greyscale to try and get file size down.
-- Multiple zoom levels per page so you can get a very detailed look, or zoom 
out to quickly navigate the pages.
-- Grid layout for even faster navigation.
-- Easily create links to very specific parts of a page (e.g., 
http://coltrane.music.mcgill.ca/salzinnes/experiments/diva-cci-tif/#f=true&z=5&n=5&i=salz-1-002-recto.tif&y=5276&x=-1075)
-- We have a nifty HTML5 canvas view that lets you do some basic image 
manipulation in the browser (rotate, brightness, contrast, colour channel 
manipulation). Above each page there's a little gear icon; clicking this will 
take you to the image manipulator. Your manipulations are also stored locally, 
so you can return to the page with your modifications intact. We did this 
because our scholars wanted to view things like marginalia, or increase the 
contrast for faded inks.
-- You can integrate it into an existing page or digital collection.
-- Lots of public hooks for tying it in to other scripts, and a simple plugin 
API for extending it.

And, it's all open source. https://github.com/DDMAL/diva.js

Hope that helps.
-Andrew

On 2013-02-23, at 7:51 AM, Kyle Banerjee  wrote:

>> I did PDF.  There are about no studies on PDF size and usability.  What I
>> did is go to gray scale for text pages to knock down file size, played with
>> optimizing, and broke super long (think 3K page book) files in smaller
>> chunks
>> 
>> When I looked at other big long books online, I found they tended to use
>> 300 dpi gray scale or 600 dpi black and white.  I just looked at government
>> documents, because that's what I worked with.
>> 
> 
> Hadn't thought about doing government reports. That's a good use case for
> what we have -- one of ours got downloaded about 7,000 times last month
> alone. I could probably make those really manageable in size as most pages
> would be just fine bitonal.
> 
> Can you say a bit about what you discovered while playing with
> optimization? Since archival books are a prime target, fidelity is
> important -- there is significant artwork on many  pages. I'll have to
> scale down and would really like to avoid grayscaling if necessary.
> 
> kyle


Re: [CODE4LIB] GitHub Myths (was thanks and poetry)

2013-02-21 Thread Andrew Hankinson
> An open tool is Internet email: I can send an email from my provider
> (ucop.edu) to yours (princeton.edu). A closed tool is github, where I
> need a github account to send you a pull request. An open tool would
> be one where I can send a pull request bitbucket to github.
> (Obviously, bitbucket is as closed as github in this regard.)
> 
> best, Erik
> Sent from my free software system .

Uhh… That's a different definition of "closed system" than I'm used to seeing. 
It's more akin to having a closed-source e-mail client vs. an open-source one. 
The protocol (git) is open. I can push, pull, merge, fork, and do everything I 
need to do with a repository without ever visiting GitHub -- even pushing and 
pulling to/from pull requests. I can send you a patch that you can apply on 
your Bitbucket repo without needing to touch GitHub. I can even do multiple 
origins so that I can pull from GitHub but push to Bitbucket, and vice versa. 
So while the tool may technically be closed-source, the protocol--the 
equivalent to SMTP, IMAP, and POP in your example--is wide open.

Heck, even Richard Stallman gives GitHub a pass: "I use the term "SaaS" for 
services that do your computing for you, but not for services that do only 
communication. Thus, gmail.com is not SaaS. Wordpress is not SaaS. Github is 
not SaaS, or perhaps only in trivial ways." 
(https://mayfirst.org/lowdown/august-2011/richard-m-stallman-lectures-free-software-west-bank)

GitHub makes a lot of things about using git collaboratively *easier*, but I 
can do everything I need to on a collaborative project without ever visiting 
the GitHub page itself, provided I don't want to look at the issue tracker or 
wiki.  The "Pull Request" button is just a shortcut to merging two remote 
origins. If you wanted to make a system where you can send pull requests to 
GitHub and Bitbucket, you can and nobody will stop you.

See also: http://gitlab.org.


Re: [CODE4LIB] GitHub Myths (was thanks and poetry)

2013-02-21 Thread Andrew Hankinson
Also, as a side note (and of interest to some) you *can* add pull requests to 
your repo:

https://gist.github.com/piscisaureus/3342247


On 2013-02-21, at 10:29 AM, Shaun Ellis  wrote:

> If you read my email, I don't tell anyone what to use, but simply attempt to 
> clear up some fallacies.  Distributed version control is new to many, and I 
> want to make sure that folks are getting accurate information from this list.
> 
> Unfortunately, this statement is not accurate either:
> 
> // There's a sneaky lock-in effect of having one open tool (git hosting) 
> which is fairly easy to move in and out and interoperate with, linked to 
> other closed tools (such as their issues tracker and their non-git pull 
> requests system) which are harder to move out or interoperate. //
> 
> GitHub's API allows you to easily export issues if you want to move them 
> somewhere else:
> http://developer.github.com/v3/issues/
> 
> Pull-requests are used by repository hosting platforms to make it easier to 
> suggest patches.  GitHub and BitBucket both use the pattern, and I don't 
> understand what you mean by it being a "closed tool".  If you're concerned 
> about "barriers to entry", suggesting a patch using only git or mercurial can 
> be done, but I wouldn't say it's easy.
> 
> ... and what Devon said.
> 
> -Shaun
> 
> 
> On 2/21/13 9:34 AM, MJ Ray wrote:
>> Shaun Ellis 
>>> * Myth #1 : GitHub creates a barrier to entry.
>> 
>> That's a fact, not a myth.  Myself, I won't give GitHub my full legal
>> name and I suspect there are others who won't.  So, we're not welcome
>> there and if we lie to register, all our work would be subject to
>> deletion at an arbitrary future point.
>> 
>> There's a couple of other things in the terms which aren't simple, too.
>> 
>> [...]
>>> * Myth #4 : GitHub is monopolizing open source software development.
>>>  > "... to its unfortunate centralizing of so much free/open
>>>  > source software on one platform.)"
>>> 
>>> Convergence is not always a bad thing. GitHub provides a great, free
>>> service with lots of helpful collaboration tools beyond version control.
>>>   It's natural that people would flock there, despite having lots of
>>> other options.
>> 
>> Whether or not it's a deliberate monopolising attempt, I don't think
>> that's the full reason.  It's not only natural effect.  There's a
>> sneaky lock-in effect of having one open tool (git hosting) which is
>> fairly easy to move in and out and interoperate with, linked to other
>> closed tools (such as their issues tracker and their non-git pull
>> requests system) which are harder to move out or interoperate.
>> 
>> Use github if you like.  Just don't expect everyone to do so.
>> 
>> Hope that explains,
>> 


Re: [CODE4LIB] complex drupal taxonomy question

2012-07-11 Thread Andrew Hankinson
Just taking a stab in the dark:

-- set up a "copy field" in Solr. This basically takes the content from an 
existing field and creates a mirror of it.
-- apply some extra string processing to your copy field so that it splits and 
tokenizes the content on the "-" (e.g., "enemy of islam" and "haverford" become 
two tokens on the field)
-- ???
-- Profit.

Seriously, though, I'm not sure what you would do after you've tokenized it. 
You could set up some sort of faceted browse interface to show co-occuring 
terms, or something else. Maybe some other Solr folks out there have some 
better ideas.

-Andrew

On 2012-07-11, at 11:32 AM, Laurie Allen wrote:

> Hi,
> I'm working on a drupal site with a very complicated taxonomy.
> Backstory: A polisci professor and team of students designed this
> project first as a theoretcal exercise as part of a senior thesis
> double major in political science and computer science, and then as
> the project of a very devoted and smart student using drupal. It's
> both amazingly cool and technically complex. At this point, we are
> trying to help rein it in to the library servers and help support it
> so that new crops of students can maintain it without needing to be CS
> majors, and also to help them address a few issues and problems that
> have been discovered over the past year or so. My colleague and I are
> totally new to Drupal, and to this database. While he's working on the
> solr indexing, I'm trying to help figure out the taxonomy issue.
> 
> See here: 
> http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera
> Basically, the site indexes the public statements of al-qaeda. Each
> statements is assigned a bunch of terms by students who have studied
> jihad and al-qaeda.
> 
> Each term is composed of two parts.
> First part: a keyword from a controlled list of keywords - there are
> many of these and they include places, people, theories, and other
> things. So, "Afghanistan", "Barack Obama", and "media" are all
> keywords.
> Second part: a context from a much smaller (around 20) collection of
> contexts, including I guess how the keyword figures in this statement.
> Example include "area of jihad, enemy of islam, religious relations"
> and others.
> 
> So, the full term would be "media - enemy of islam" for example. And
> each record includes a large number of these.
> 
> Going forward, we'd ideally like to allow users of the site to find
> all three of the following:
> 1. Records that contain a particular two part term. (easy - that's
> what taxonomy is for)
> 2. A list of terms that begin with the first part so that they can
> select the modifier for it (also easy, if we make the second term a
> subterm or child of the first, this will work fine)
> 3. A list of terms that have the second part as a qualifier. So, for
> example, show me all terms in which anything is called an "enemy of
> islam" and then let me choose which keyword is referred to as an enemy
> of jihad and show me that record.
> 
> It's that third one that we can't figure out. The only way we can
> think to accomplish this is to basically duplicate each entry so that
> we'd say "Haverford - enemy of islam" and "enemy of islam - Haverford"
> I think that will work, but since there are many statements, and each
> statement has many terms, this solution doesn't seem ideal. Do any of
> you have ideas?
> Thanks very much.
> Laurie
> -- 
> Coordinator for Digital Scholarship and Services
> Haverford College Library
> 370 Lancaster Ave
> Haverford, PA 19041
> 610-896-4226
> lal...@haverford.edu


Re: [CODE4LIB] Python web framework recommendations good when learning Python

2012-07-10 Thread Andrew Hankinson
Have a look at Tornado:

http://www.tornadoweb.org/

It's our default "get something up and running quickly" Python framework.

-Andrew

On 2012-07-10, at 8:05 PM, William Denton wrote:

> I have a fairly basic web service I want to hack on that would manage some 
> stuff (not too much) and feed out JSON in response to request.  I'd like to 
> do it in Python so I can get to know the language.
> 
> StackOverflow is filled with comparisons of Python web frameworks, but I 
> wanted to get the sense from all the Python hackers here about what framework 
> might be a good one given their personal experiences.
> 
> Django is very full-featured and well documented, and would make a complex 
> project simple, but I think has more than I need; Flask looks pretty simple 
> and could suit the basic service I want to do; web2py looks pretty rich.
> 
> I know this isn't a particularly answerable question and the best thing to do 
> is to try one and hack on it, and do it right the second time, but since 
> future Python work might involve RDF and linked data, and there are so many 
> Python people here whose opinion I value, I thought I'd throw it out.
> 
> Thanks,
> 
> Bill
> -- 
> William Denton
> Toronto, Canada
> http://www.miskatonic.org/


Re: [CODE4LIB] responsiveness and Wordpress

2012-07-08 Thread Andrew Hankinson
'Responsive,' in modern web design parlance, refers to the ability of your 
layout to respond to the different devices and screen sizes that may be 
accessing your site, and present your content in such a way that it doesn't 
force the user into non-native device modes of interaction (e.g., 1280 pixels 
wide means the user on the iPhone will be doing a lot of horizontal scrolling 
and zooming). So not a re-definition; just an additional meaning.

 
On 2012-07-08, at 1:58 PM, Dave Caroline wrote:

> I always understood responsive to be opposed to sluggish and a
> reference to speed.
> Do I see a redefinition starting up?
> 
> Dave Caroline


Re: [CODE4LIB] viewer for TIFFs on iPad

2012-05-11 Thread Andrew Hankinson
Hi Edward,

A bit of disclosure: I'm one of the developers for Diva.

We have done quite a bit of experimentation for viewing images on various 
platforms, and even on a Mac Pro with 8GB of RAM and an SSD, 300MB TIFF images 
still require a bit of waiting for any viewing or operations.

As Dave mentioned, we're developing the Diva viewer to do online viewing. It 
requires a bit of server setup, but the big advantage is that I find it's 
actually faster to view large images online in the browser than it is to view 
them off a hard drive.

These images:

http://coltrane.music.mcgill.ca/salzinnes/experiments/diva-cci-tif/

are approximately 170MB for each page (about 80GB for the whole document), but 
since we only ever serve out the parts of the document that you are looking at, 
it makes viewing large medieval manuscripts very easy and fast, without 
sacrificing the ability to zoom in to see very fine details.

We did a bit of testing on the iPad early on, but haven't tested it since we 
did another round of development.

If you're interested, let me know and I can help you get it set up.

Cheers,
-Andrew


On 2012-05-10, at 5:16 PM, Edward Iglesias wrote:

> Hello All,
> 
> I was wondering if any of you had experience viewing large ~300MB and
> up TIFF files on an iPad.  I can get them to the iPad but the photo
> viewer is less than optimal.  It stops enlarging after a while and I'm
> looking at Medieval manuscripts so...
> 
> 
> Edward Iglesias


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Andrew Hankinson
It doesn't work with PDFs, since it needs to create a tiled TIFF image for each 
page.

I don't know of anything similar for PDFs, since they're not really designed to 
render a portion of the document without downloading the entire thing.

You can convert PDF pages to images, though... :)

-Andrew

On 2011-10-03, at 12:09 PM, Parker, Anson (adp6j) wrote:

> So this is awesome, does it in fact work with PDF's or not, and if not
> does anyone have any similar tools recommended for pdfs
> ap
> 
> 
> On 10/3/11 11:12 AM, "Dave Caroline"  wrote:
> 
>> Diva was announced here of 6th of June
>> https://listserv.nd.edu/cgi-bin/wa?A2=ind1106&L=CODE4LIB&T=0&F=&S=&P=27064
>> 
>> The clever part is you only send the visible part at the scale they
>> are viewing so little excess bandwidth.
>> 
>> For online document view it takes some beating and is not too hard to set
>> up
>> My demo is running on an adsl line from home, probably a worst case speed
>> demo.
>> 
>> Site is
>> 
>> http://ddmal.music.mcgill.ca/diva
>> 
>> real demo
>> http://ddmal.music.mcgill.ca/diva/demo
>> 
>> Dave Caroline
>> 
>> On Mon, Oct 3, 2011 at 3:36 PM, Eric Lease Morgan  wrote:
>>> On Oct 3, 2011, at 10:26 AM, Dave Caroline wrote:
>>> 
 It is educational to look at memory use in the pc when that pdf is
 loaded.
 Evince here is using 600meg do you have space for such objects on
 these little toys
 
 try something like diva so you dont suck the resources dry on the
 client
>>> 
>>> Please tell me (us) more about diva. I am not familiar with it.  --Eric
>>> Morgan
>>> 


Re: [CODE4LIB] opening a pdf file [diva]

2011-10-03 Thread Andrew Hankinson
On 2011-10-03, at 11:29 AM, Eric Lease Morgan wrote:
>> 
> Very interesting, and thank you for bringing it to my attention. It seems it 
> relies on a technology that reads and chunks up image files. Alas, I have 
> PDFs. Moreover, I really want people to be able to print the entire 
> documents. I suppose I could convert my PDF files into images and go that 
> route. Hmm…

I'm one of the developers of Diva. I noticed that you've been getting your 
files from the Internet Archive. They also have the full high-quality JPEG and 
JPEG2000 images available.

http://ia600209.us.archive.org/6/items/acourseofreligio00gerauoft/

You could use those for Diva instead of the already-compressed PDF.

Printing could still be handled by downloading the PDF, but if you just want to 
be able to view it online then I'd be happy to help you get Diva set up.

Note that we also have an article in the latest C4L journal describing how it 
works:http://journal.code4lib.org/articles/5418

Cheers!
-Andrew

> 
> -- 
> Eric Morgan


Re: [CODE4LIB] iPads as Kiosks

2011-08-23 Thread Andrew Hankinson
You can distribute apps via an internal web server, with no need to go out to 
Apple.

http://developer.apple.com/library/ios/#featuredarticles/FA_Wireless_Enterprise_App_Distribution/Introduction/Introduction.html

You need to be a registered business to do this, and it costs $299/yr. You get 
a digital certificate, but that doesn't mean your code needs to be "seen" by 
anyone outside of your org.


On 2011-08-23, at 1:47 PM, David Uspal wrote:

> When I did my iPhone work, it was back in 2009 before this document even 
> existed, so it's good they've come some distance on this issue since then.  
> Still, the document below doesn't break the dependency on the iTunes store 
> and/or a digital certificate issued by Apple to download applications (if I'm 
> reading page 63 right), which was the big sticking point of the contract.  
> Not only did the user not want the network controlled by Apple (which this 
> document does handle), they also didn't want the code seen by any outside 
> source at all (aka via uploading it to the store)
> 
> 
> David K. Uspal
> Technology Development Specialist
> Falvey Memorial Library
> Phone: 610-519-8954
> Email: david.us...@villanova.edu
> 
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Andrew Hankinson
> Sent: Tuesday, August 23, 2011 1:34 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] iPads as Kiosks
> 
> They now have an enterprise app deployment mechanism.
> 
> http://www.apple.com/support/iphone/enterprise/
> 
> 
> On 2011-08-23, at 12:54 PM, David Uspal wrote:
> 
>> Then again, by selecting the iPad you're essentially tethered to Apple's 
>> iron grip of the iWorld via its iTunes vetting process and strict control of 
>> Apple hardware.   YMMV on this depending on what you're doing, but it should 
>> definitely be a consideration when choosing between Android tablets and the 
>> iPad. 
>> 
>> Quick side story -- we had to drop a contract one time at my old job due to 
>> the customer proprietary requirements.  The customer didn't want to release 
>> its developed software outside of house (minus the developers of course) and 
>> Apple wouldn't give them a waiver from using the iTunes store.  Mind you, 
>> this was a very big company with resources, so Apple probably lost a 5000 
>> unit sale due to this
>> 
>> 
>> David K. Uspal
>> Technology Development Specialist
>> Falvey Memorial Library
>> Phone: 610-519-8954
>> Email: david.us...@villanova.edu
>> 
>> 
>> 
>> 
>> 
>> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
>> Stephen X. Flynn
>> Sent: Tuesday, August 23, 2011 9:01 AM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] iPads as Kiosks
>> 
>> Let's not forget a far superior user experience.
>> 
>> 
>> 
>> Stephen X. Flynn
>> Emerging Technologies Librarian
>> Andrews Library, College of Wooster
>> 1140 Beall Ave.
>> Wooster, OH 44691
>> (330) 263-2154
>> http://www.sxflynn.net
>> 
>> 
>> 
>> On Aug 22, 2011, at 12:56 PM, Madrigal, Juan A wrote:
>> 
>>> I would definitely go with the iPad. More accessories, better support and
>>> consistency. 
>>> 
>>> 
>>> Juan Madrigal
>>> 
>>> Web Developer
>>> Web and Emerging Technologies
>>> University of Miami
>>> Richter Library
>>> 
>>> 
>>> 
>>> On 8/22/11 11:19 AM, "Dan Funk"  wrote:
>>> 
>>>> There is a good discussion here about Android vs iPad based tables for
>>>> use as Kiosks - lots of good information to consider.
>>>> I'd love to hear what you end up doing.
>>>> 
>>>> http://stackoverflow.com/questions/6050217/android-tablet-or-ipad-for-kios
>>>> k-device
>>>> 
>>>> On Mon, Aug 22, 2011 at 11:08 AM, Kyle Banerjee 
>>>> wrote:
>>>>> On Fri, Aug 19, 2011 at 5:48 AM, Edward Iglesias
>>>>> wrote:
>>>>> 
>>>>>> Apologies if this has been covered already but do any of you have
>>>>>> experience
>>>>>> using iPads as kiosks?  We would like to set up several as directional
>>>>>> beacons with a sot of "you are here" feature.  I've found several apps
>>>>>> to
>>>>>> do
>>>>>> the kiosk feature but the home button seems to be an issue.
>>>>>> Suggestions
>>>>>> include a case that locks out the home button such as this
>>>>>> 
>>>>> 
>>>>> For kiosks, it seems like wifi chromebooks might be a decent option.
>>>>> They're
>>>>> cheaper than ipads, can't do anything other than browse the web, and
>>>>> it's
>>>>> easy to plug in external peripherals like keyboards, mice, and monitors.
>>>>> 
>>>>> kyle
>>>>> 
>>>>> --
>>>>> --
>>>>> Kyle Banerjee
>>>>> Digital Services Program Manager
>>>>> Orbis Cascade Alliance
>>>>> baner...@uoregon.edu / 503.877.9773
>>>>> 


Re: [CODE4LIB] iPads as Kiosks

2011-08-23 Thread Andrew Hankinson
They now have an enterprise app deployment mechanism.

http://www.apple.com/support/iphone/enterprise/


On 2011-08-23, at 12:54 PM, David Uspal wrote:

> Then again, by selecting the iPad you're essentially tethered to Apple's iron 
> grip of the iWorld via its iTunes vetting process and strict control of Apple 
> hardware.   YMMV on this depending on what you're doing, but it should 
> definitely be a consideration when choosing between Android tablets and the 
> iPad. 
> 
> Quick side story -- we had to drop a contract one time at my old job due to 
> the customer proprietary requirements.  The customer didn't want to release 
> its developed software outside of house (minus the developers of course) and 
> Apple wouldn't give them a waiver from using the iTunes store.  Mind you, 
> this was a very big company with resources, so Apple probably lost a 5000 
> unit sale due to this
> 
> 
> David K. Uspal
> Technology Development Specialist
> Falvey Memorial Library
> Phone: 610-519-8954
> Email: david.us...@villanova.edu
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Stephen X. Flynn
> Sent: Tuesday, August 23, 2011 9:01 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] iPads as Kiosks
> 
> Let's not forget a far superior user experience.
> 
> 
> 
> Stephen X. Flynn
> Emerging Technologies Librarian
> Andrews Library, College of Wooster
> 1140 Beall Ave.
> Wooster, OH 44691
> (330) 263-2154
> http://www.sxflynn.net
> 
> 
> 
> On Aug 22, 2011, at 12:56 PM, Madrigal, Juan A wrote:
> 
>> I would definitely go with the iPad. More accessories, better support and
>> consistency. 
>> 
>> 
>> Juan Madrigal
>> 
>> Web Developer
>> Web and Emerging Technologies
>> University of Miami
>> Richter Library
>> 
>> 
>> 
>> On 8/22/11 11:19 AM, "Dan Funk"  wrote:
>> 
>>> There is a good discussion here about Android vs iPad based tables for
>>> use as Kiosks - lots of good information to consider.
>>> I'd love to hear what you end up doing.
>>> 
>>> http://stackoverflow.com/questions/6050217/android-tablet-or-ipad-for-kios
>>> k-device
>>> 
>>> On Mon, Aug 22, 2011 at 11:08 AM, Kyle Banerjee 
>>> wrote:
 On Fri, Aug 19, 2011 at 5:48 AM, Edward Iglesias
 wrote:
 
> Apologies if this has been covered already but do any of you have
> experience
> using iPads as kiosks?  We would like to set up several as directional
> beacons with a sot of "you are here" feature.  I've found several apps
> to
> do
> the kiosk feature but the home button seems to be an issue.
> Suggestions
> include a case that locks out the home button such as this
> 
 
 For kiosks, it seems like wifi chromebooks might be a decent option.
 They're
 cheaper than ipads, can't do anything other than browse the web, and
 it's
 easy to plug in external peripherals like keyboards, mice, and monitors.
 
 kyle
 
 --
 --
 Kyle Banerjee
 Digital Services Program Manager
 Orbis Cascade Alliance
 baner...@uoregon.edu / 503.877.9773
 


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Andrew Hankinson
Disclaimer: I helped write this software.

You may want to look at our just-released Diva.js script. It can handle 
document images up to many gigabytes in size, in many different resolutions. 
The big advantage, though, is that the user only ever downloads the portion of 
the document that they are looking at so viewing is almost instant. We 
specifically designed it to work on slower network connections. It's in 
Javascript, so it runs in any modern web browser with no Flash or PDF plugin 
needed.

It does require some server-side infrastructure to set up. We have a recent 
code4lib journal article that describes how it works and what is needed. 

Code4Lib article: http://journal.code4lib.org/articles/5418

Here is a very basic demo: http://ddmal.music.mcgill.ca/diva/demo/ (This book 
is ~4GB of images).

And the code is available here: https://github.com/ddmaL/diva.js

We have a new version coming out very soon that fixes some bugs and adds a 
"contact sheet" view for quickly scrolling through all the images.

If you need any more info, please let me know and I would be happy to help.

Cheers,
-Andrew

On 2011-08-04, at 8:45 AM, Cowles, Esme wrote:

> I've thought about using JPEG page images instead of PDFs to serve our 
> scanned newspapers, which also have sizes ranging upwards of 100MB+, with a 
> link to download the PDF as a fallback for people who really want that.  The 
> downside is having to do the bulk conversion, manage the extra files, etc.
> 
> Another option would be a flash frontend.  Someone already mentioned Google, 
> and I've also seen some use of issuu.com (our campus newspaper currently uses 
> them).  There are also options you could integrate into your own site, such 
> as FlexPaper (http://flexpaper.devaldi.com/).  You still have to upload 
> and/or convert your files, but you retain a PDF-like display in the browser.
> 
> -Esme
> --
> Esme Cowles 
> 
> "A person, who is nice to you, but rude to the waiter, is not a nice person.
> (This is very important. Pay attention. It never fails.) " -- Dave Barry
> 
> On 08/3/2011, at 7:36 PM, Ranti Junus wrote:
> 
>> Dear All,
>> 
>> My colleague came with this query and I hope some of you could give us some
>> ideas or suggestion:
>> 
>> Our Digital Multimedia Center (DMC) scanning project can produce very large
>> PDF files. They will have PDFs that are about 25Mb and some may move into
>> the 100Mb range. If we provide a link to a PDF of that large, a user may not
>> want to try to download it even though she really needs to see the
>> information. In the past, DMC has created a lower quality, smaller versions
>> to the original file to reduce the size. Some thoughts have been tossed
>> around to reduce the duplication or the work (e.g. no more creating the
>> lower quality PDF manually.)
>> 
>> They are wondering if there is an application that we could point to the end
>> user, who might need it due to poor internet access, that if used will
>> simplify the very large file transfer for the end user. Basically:
>> - a client software that tells the server to manipulate and reduce the file
>> on the fly
>> - a server app that would to the actual manipulation of the file and then
>> deliver it to the end user.
>> 
>> Personally, I'm not really sure about the client software part. It makes
>> more sense to me (from the user's perspective) that we provide a "download
>> the smaller size of this large file" link that would trigger the server-side
>> apps to manipulate the big file. However, we're all ears for any suggestions
>> you might have.
>> 
>> 
>> thanks,
>> ranti.
>> 
>> 
>> -- 
>> Bulk mail.  Postage paid.


[CODE4LIB] Diva.js 1.0 Released

2011-06-05 Thread Andrew Hankinson
We're pleased to announce the first version of Diva.js, a continuous document 
image viewer for displaying high-resolution document images in the web browser.

Diva (Document Image Viewer with Ajax) is a multi-page document image viewer, 
designed to present all document page images on a single page, rather than the 
traditional method of viewing page images one at a time. Using "lazy loading" 
and image tiling methods for loading parts of a document on demand, it presents 
a quick and efficient way of navigating through hundreds (or even thousands) of 
high-resolution page images from digitized books and other documents on a 
single web page. Lazy loading and image tiling have the additional benefit of 
allowing institutions to display their book and manuscript collections online 
while protecting against most forms of indiscriminate downloading and 
republishing.

Perhaps the most unique feature of Diva.js is that it is designed for 
displaying high- and low-resolution versions of the same page without requiring 
the user to download multiple images or navigate to different web pages. This 
is especially useful for displaying documents where users need the ability to 
"zoom" in on small details, and then "zoom" back out to quickly scroll through 
the document.

You can read more and download the code on our website:

http://ddmal.music.mcgill.ca/diva

Our demo features a manuscript inventory of musical sources ("Bonus Ordo," 
CH-BM, StiA Bd. 1206) from the collegiate Church of St. Michael in Beromünster, 
Switzerland [1]. The images in this collection total 4.1 GB.

http://ddmal.music.mcgill.ca/diva/demo

FEATURES:
* Allows multiple-resolution photos to be displayed inline on a single web page;
* Intuitive mouse commands: Double-click to zoom in; ctrl-double-click to zoom 
out; Click-and-drag to move the image;
* "Lazy loading" and image tiling techniques promote fast display times even on 
slower connections, and offer limited protection for image copyrights;
* Fullscreen mode;
* Open Source

REQUIREMENTS:
Diva requires some "behind the scenes" server infrastructure to function. Full 
installation requirements and instructions may be found 
athttps://github.com/DDMAL/diva.js/wiki

FEEDBACK:
If you find Diva.js valuable, and are looking to integrate it in pilot or 
production projects, please get in touch and let us know how you're using it. 
If you want to help us develop it, please consult our developers documentation 
on GitHub. If you find a bug, please let us know by filing an issue on our 
GitHub page.

For more information, please contact Andrew Hankinson 
(andrew.hankin...@mail.mcgill.ca).

Re: [CODE4LIB] HTML Load Time

2010-12-06 Thread Andrew Hankinson
Another option might be to create a PDF version of this document for the 
download. It's not *ideal*, but it would certainly alleviate many of the 
transfer/rendering problems. You can still index the EAD on the back-end, and 
maybe even provide section-level access via AJAX and some back-end document 
calls, but if you want to make the whole thing available I wouldn't do it in 
HTML.

Is there any reason you need/want to keep it as a webpage?

On 2010-12-06, at 3:04 PM, Ken Irwin wrote:

> Nathan,
> 
> Would it make sense to break this up into several documents and add a search 
> function? You could still have a giant, one-page (and thus easily-printable) 
> option, but maybe that wouldn't be the default. 
> 
> The search feature I'm envisioning would just be a search of key words in the 
> title of a box. A tag-cloud sort of thing might be a useful way of making 
> some of the keywords visible too. 
> 
> Ken
> 
> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
> Nathan Tallman
> Sent: Monday, December 06, 2010 2:49 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] HTML Load Time
> 
> Hi Cod4Libers,
> 
> I've got a LARGE finding aid that was generated from EAD.  It's over 5 MB
> and has caused even Notepad++ and Dreamweaver to crash.  My main concern is
> client-side load time.  The collection is our most heavily used and the
> finding aid will see a lot of traffic.  I'm fairly adept with HTML, but I
> can't think of anything.  Does anyone have any tricks or tips to decrease
> the load time?  The finding aid can be viewed at <
> http://www.americanjewisharchives.com/aja/FindingAids/ms0361.html>.
> 
> Thanks,
> Nathan Tallman
> Associate Archivist
> American Jewish Archives


Re: [CODE4LIB] Django

2010-10-25 Thread Andrew Hankinson
Django is a web framework; Python is the language.

If you don't know the difference, I'd suggest sticking with PHP and going with 
one of the frameworks available to you there.


On 2010-10-25, at 4:25 PM, Junior Tidal wrote:

> Thanks for the suggestions everyone. I haven't actively looked for resources 
> since I'm busy doing collection development. However, I came across an 
> advertisement for a Django book and figured it would be a useful language to 
> learn. I already know php, so it seems logical that django is the next step?
> 
> Best,  
> 
> Junior Tidal
> Assistant Professor
> Web Services and Multimedia Librarian
> New York City College of Technology, CUNY 
> 300 Jay Street
> Brooklyn, NY 11210
> 718.260.5481
> 
> http://library.citytech.cuny.edu
> 
> 
>>>> Andrew Hankinson  10/25/2010 10:23 AM >>>
> There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
> revised edition for 1.0)
> The Django docs, with some intro tutorials: 
> http://docs.djangoproject.com/en/1.2/ 
> 
> Did you try those already?
> 
> 
> On 2010-10-25, at 10:19 AM, Junior Tidal wrote:
> 
>> Hello Code4Lib,
>> 
>> Does anyone have any recommendations for learning Django? Books, websites, 
>> video tutorials, etc. ...
>> 
>> thanks,
>> 
>> Junior Tidal
>> Assistant Professor
>> Web Services and Multimedia Librarian
>> New York City College of Technology, CUNY 
>> 300 Jay Street
>> Brooklyn, NY 11210
>> 718.260.5481
>> 
>> http://library.citytech.cuny.edu


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Andrew Hankinson
I guess what I meant is that in MARCXML, you have a  element with 
subsequent  elements each with fairly clear attributes, which, while 
not my idea of fun Sunday-afternoon reading, requires less specialized tools to 
parse (hello Textmate!) and is a bit easier than trying to count INT positions. 
One quick XPath query and you can have all 245 fields, regardless of their 
length or position in the record.


On 2010-10-25, at 3:26 PM, Nate Vack wrote:

> On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding  wrote:
>> - XML is self-describing, binary is not.
>> 
>> Not to quibble, but that's only in a theoretical sense here. Something
>> like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
>> At least MARC records kinda imitate catalog cards.
> 
> Yeah -- this is kinda the source of my confusion. In the case of the
> files I'm reading, it's not that it's hard to find out where the
> nMeasurement field lives (it's six short ints starting at offset 64),
> but what the field means, and whether or not I care about it.
> 
> Switching to an XML format doesn't help with that at all.
> 
> WRT character encoding issues and validation: if MARC and MARCXML are
> round-trippable, a solution in one environment is equivalent to a
> solution in the other.
> 
> And I think we've all seen plenty of unvalidated, badly-formed XML,
> and plenty with Character Encoding Problemsâ„¢ ;-)
> 
> Thanks for the input!
> -Nate


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Andrew Hankinson
I'm not a big user of MARCXML, but I can think of a few reasons off the top of 
my head:

- Existing libraries for reading, manipulating and searching XML-based 
documents are very mature.
- Documents can be validated for their "well-formedness" using these existing 
tools and a pre-defined schema (a validator for MARC would need to be 
custom-coded)
- MARCXML can easily be incorporated into XML-based meta-metadata schemas, like 
METS.
- It can be parsed and manipulated in a web service context without sending a 
binary blob over the wire.
- XML is self-describing, binary is not.

There's nothing stopping you from reading the MARCXML into a binary blob and 
working on it from there. But when sharing documents from different 
institutions around the globe, using a wide variety of tools and techniques, 
XML seems to be the lowest common denominator.

-Andrew

On 2010-10-25, at 2:38 PM, Nate Vack wrote:

> Hi all,
> 
> I've just spent the last couple of weeks delving into and decoding a
> binary file format. This, in turn, got me thinking about MARCXML.
> 
> In a nutshell, it looks like it's supposed to contain the exact same
> data as a normal MARC record, except in XML form. As in, it should be
> round-trippable.
> 
> What's the advantage to this? I can see using a human-readable format
> for poorly-documented file formats -- they're relatively easy to read
> and understand. But MARC is well, well-documented, with more than one
> free implementation in cursory searching. And once you know a binary
> file's format, it's no harder to parse than XML, and the data's
> smaller and processing faster.
> 
> So... why the XML?
> 
> Curious,
> -Nate


Re: [CODE4LIB] Django

2010-10-25 Thread Andrew Hankinson
There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
revised edition for 1.0)
The Django docs, with some intro tutorials: 
http://docs.djangoproject.com/en/1.2/

Did you try those already?


On 2010-10-25, at 10:19 AM, Junior Tidal wrote:

> Hello Code4Lib,
> 
> Does anyone have any recommendations for learning Django? Books, websites, 
> video tutorials, etc. ...
> 
> thanks,
> 
> Junior Tidal
> Assistant Professor
> Web Services and Multimedia Librarian
> New York City College of Technology, CUNY 
> 300 Jay Street
> Brooklyn, NY 11210
> 718.260.5481
> 
> http://library.citytech.cuny.edu


Re: [CODE4LIB] generating unique integers

2010-05-28 Thread Andrew Hankinson
If your only purpose is to name the files, do you need a "globally" unique ID? 
Why not just use an increment, along with the author & first letter.

e.g.

>  Aeschylus / Prometheus Bound => aeschylus_p01.txt
>  Aeschylus / Suppliant Maidens => aeschylus_s01.txt
>  American State / Articles of confederation => american_state_a01.txt
> ...
>  Aristotle / On Generation And Corruption => aristotle_o01.txt
>  Aristotle / On The Gait Of Animals => aristotle_o02.txt
>  Aristotle / On The Generation Of Animals => aristotle_o03.txt

If you generate your integers in alphabetical order, they'll always be the same 
for each title and you'll never get filename collisions. Unless your list will 
grow - will it?

-Andrew


On 2010-05-28, at 9:34 AM, Eric Lease Morgan wrote:

> Given a list of unique strings, how can I generate a list of short, unique 
> integers?
> 
> I have a list of about 250 unique author/title combinations, such as:
> 
>  Aeschylus / Prometheus Bound
>  Aeschylus / Suppliant Maidens
>  American State / Articles of confederation
>  American State / Declaration of Independence
>  Aquinas / Summa Theologica
>  Aristophanes / Achamians
>  Aristophanes / Clouds
>  Aristophanes / Ecclesiazusae
>  Aristotle / On Generation And Corruption
>  Aristotle / On The Gait Of Animals
>  Aristotle / On The Generation Of Animals
>  ...
> 
> From each author/title combination I want to create a file name (key). 
> Specifically, I want a file name with the following form: 
> author-firstwordofthetitle-integer.txt  Such a scheme will make it 
> (relatively) easy for me to look at the file name and know the what title is 
> and by whom.
> 
> Using Perl, how can I convert the author/title combination into some sort of 
> integer, checksum, or unique value that is the same every time I run my 
> script? I don't want to have to remember what was used before because I don't 
> want to maintain a list of previously used keys. Should I use some form of 
> the pack function? Should I sum the ASCII values of each character in the 
> author/title combination?
> 
> -- 
> Eric Morgan


Re: [CODE4LIB] OCLC Service Outage Update

2010-05-10 Thread Andrew Hankinson
Writing code in "energy efficient" languages is the funniest thing  
I've heard in a while. It ranks up there with setting my desktop  
wallpaper to black because "it uses less energy."


More servers are required because more people are writing webapps  
because Ruby and PHP make it easier for more people to do it. Is there  
even a C webapp framework available?


-A

On 2010-05-10, at 16:59, stuart yeates  wrote:


Simon Spero wrote:

Of course, the real problem is that too many people are writing  
unoptimized
code in energy-inefficient languages like ruby and PHP, which  
require far
more servers, and far more cooling, to do the same work as properly  
written

code.


No, the real problem is with trolls sending flamebait.

cheers
stuart
--
Stuart Yeates
http://www.nzetc.org/   New Zealand Electronic Text Centre
http://researcharchive.vuw.ac.nz/ Institutional Repository


Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Andrew Hankinson
Couldn't you do MARC -> MARCXML -> JSON?

-Andrew

On 2010-04-12, at 5:00 PM, Benjamin Young wrote:

> On 4/12/10 4:47 PM, Ryan Eby wrote:
>> You could put your logs, marc records broken out by fields or
>> arrays/hashes (types in couchdb) in any of them but the approach each
>> takes would limit you (or empower you) differently.
>>   
> Once there's a good marc2json script (and format) out there, it'd be grand to 
> see marc records dumped into CouchDB to allow them to be replicated between 
> groups of librarians (and even up to OpenLibrary). I'm still up for helping 
> make that possible if anyone's "into" that. :)


Re: [CODE4LIB] newbie

2010-03-25 Thread Andrew Hankinson
Just out of curiosity I tried them in quotes:

"sexy ruby" - 72,200
"sexy python" - 37,900
"sexy php" - 25,100
"sexy java" - 16,100
"sexy asp" - 14,800
"sexy perl" - 8,080
"sexy C++" - 177
"sexy FORTRAN" - 67
"sexy COBOL" - 8

I tried "sexy lisp" but the results were skewed by speech impediment fetishes. 
Which I'd say is even less strange than 8 people thinking you can write sexy 
COBOL.

On 2010-03-25, at 10:20 PM, Tim Spalding wrote:

>> Finally, I never would have put the strings "PHP" and "sexiness" in a 
>> sentence together (though I guess I just did).
> 
> A simple Google search shows how very wrong you are:
> 
> "sexy php" - 56,100,000 results
> "sexy asp" - 8,380,000
> "sexy java" - 6,360,000
> "sexy ruby" - 2,840,000
> "sexy perl" - 532,000
> "sexy C++" - 488,000
> "sexy smalltalk" - 113,000
> "sexy fortran" - 107,000
> "sexy COBOL" - 58,100
> 
> There are also very high results for "sexy logo." Perhaps, since I was
> in fourth grade, someone's figured out something interesting to do
> with that stupid turtle!
> 
> Tim


Re: [CODE4LIB] newbie

2010-03-24 Thread Andrew Hankinson

On 24-Mar-10, at 8:21 PM, Paul Cummins wrote:


On 3/24/2010 7:43 PM, David Kane wrote:
A friend of mine once described PHP as 'brain-dead PERL', but I  
like and use

both languages quite a bit.

David.

On 24 March 2010 23:17, Tim Spalding  wrote:


PHP. I have to agree with others - don't bother with PHP.


Largest website in Perl: Del.icio.us

Largest website in PHP: Facebook

Tim







Ok, I know there are people that use PHP out there.  :)

 I'd recommend PHP, especially to a beginner, but only if they are  
going to learn the whole LAMP system and how to make it work. Oh,  
and learn the changes between versions, like between 5.1 and 5.2.  
And read every comment on their manual pages.
And never install a widely distributed PHP program unless you rename  
it(scanners know all the famous ones).  We use the PHP CLI as a  
replacement for perl and for processing XML and a thousand other  
things without even going through Apache.
 But above all, if you do learn it and use it for years, don't tell  
the programmers in an email list that you did.


-Paul


You could always claim that you write Python instead:


import os
os.system('/usr/local/bin/php utility.php')


Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-16 Thread Andrew Hankinson
This may be one area where FRBR is not exactly clear on the directions its 
relationships take, or how extensive the cataloguing should be.

An album with Beethoven's 7, 8 & 9th Symphonies performed by the London 
Philharmonic would be a manifestation containing three independent expressions 
of these works, but the album wouldn't be a work by itself. You can have 
dependent forward relationships, i.e. "Work is an Expression contained in a 
Manifestation" but, as far as I know, there's no way to specify that a 
manifestation containing independent works as a separate work unto itself, and 
still stay within the FRBR model. (please, correct me if I'm wrong...)

In the textual realm, I would think an analogy would be a collection of poems 
being considered as a collection of independent works, since a poem could be 
contained in multiple anthologies and each poem is often an independent 
intellectual entity. Same with a collection of short stories. However, there 
are pronounced differences in scale between music and text, since the 
possibilities of different expressions of poetry and textual materials (e.g. an 
audio version of William Shatner reading Leonard Nimoy's poetry) are 
considerably smaller and less frequent than the the number of different 
expressions possible for a musical work (e.g. the performances of ten different 
orchestras, plus the number of different print editions, performance versions, 
commentaries or DVD versions would all be different expressions of Beethoven's 
7th Symphony.)

It further breaks down when considering things like the Encyclopedia 
Britannica. Is the Encyclopedia the work, or is each individual entry 
(sometimes quite lengthy and exhaustive) considered independent works?

It seems to me that aggregating independent works into a singular container 
expression is certainly expedient, but does not necessarily conform to the 
letter of the FRBR law.  If someone wants to find a given poem and if it isn't 
listed as an independent work, then they'll still need to (somehow) know the 
exact anthologies that contain it, since the granularity stops at the level of 
the container item and not at the level of the true "work". The answer is to 
list it in a Table of Contents field, but then we're back at square one where 
we depend on the indexing of the Table of Contents fields to uncover the 
contents of an entity, rather than the FRBR vision of having an explicitly 
defined and catalogued set of relationships.

-Andrew

On 2010-03-16, at 6:30 PM, Jonathan Rochkind wrote:

> If a text aggregate "is" an expression -- that expression must belong to SOME 
> work though, right?
> 
> And if the individual things inside the aggregate ALSO exist on their own 
> independently (or in OTHER aggregations)... and you want to model that (which 
> you may NOT want to spend time modelling in the individual cases, depending 
> on context)... dont' those individual things inside the aggregate need to be 
> modelled as expressions (which belong to a work) themselves?
> 
> In general, Jenn has spent more time thinking about these things in terms of 
> music-related records than even the long discussions on RDA-L, and I think 
> has even authored a position paper for some body on this subject?  
> I am guessing that in musical cataloging, the individual things inside an 
> aggregate often DO exist on their own independently or in other aggregations, 
> and for the needs of music patrons, that DOES need to be modelled, and I 
> don't see how to do it except to call those things works of their own too?
> If Symphony X is a work, then it's still a work when an expression of it is 
> bound together with Symphony's A, B, and C, right?  
> Jonathan
> 
> Karen Coyle wrote:
>> Jenn, I can't claim to have spent sufficient time looking at this,  but... 
>> are you on the RDA-L list? Because we just went through a very  long 
>> discussion there in which we concluded that a text aggregate  (possibly 
>> analogous to a sound recording aggregate) is an expression,  not a "set" of 
>> separate work/expression entities. Your example implies  the latter, with 
>> the aggregate being described only at the  manifestation level. (And now I'm 
>> confused as to what the work would  be in something like a text collection, 
>> such as an anthology of poems.  Would the anthology be a work?)
>> 
>> kc
>> 
>> 
>> Quoting "Riley, Jenn" :
>> 
>>  
>>> The Variations/FRBR project at Indiana University   (http://vfrbr.info) is 
>>> pleased to announce the release of an initial   set of XML Schemas for the 
>>> encoding of FRBRized bibliographic data.   The Variations/FRBR project aims 
>>> to provide a concrete testbed for   the FRBR conceptual model, and these 
>>> XML Schemas represent one step   towards that goal by prescribing a 
>>> concrete data format that   instantiates the conceptual model. Our project 
>>> has been watching   recent work to represent the FRBR-based Resource 
>>> Description and   Access (RDA) e

Re: [CODE4LIB] Python BagIt Library

2010-02-24 Thread Andrew Hankinson
Thanks Ed - just wanted to make sure I wasn't stepping on any toes. :)

-Andrew

On 2010-02-24, at 6:06 PM, Ed Summers wrote:

> On Wed, Feb 24, 2010 at 12:50 PM, Andrew Hankinson
>  wrote:
>> I'd also like to send a nod out to Ed Summer's Python BagIt library, 
>> (http://github.com/edsu/bagit) which I just found this morning in preparing 
>> to write this email. Sorry for the duplication! When I started this project 
>> the only thing I could find was an older incomplete implementation by the 
>> LoC. If there's an interest in combining the two projects I would be more 
>> than willing to do so.
> 
> Thanks for the nod Andrew. I just quickly released something I had
> been using to bag up directories. Mine [1] doesn't validate or
> anything, although I had thought about adding that. I guess it could
> make sense to try to join forces, but having multiple implementations
> isn't particularly bad. I think it's speaks well of the BagIt spec in
> fact :-)
> 
> //Ed
> 
> [1] http://github.com/edsu/bagit


[CODE4LIB] Python BagIt Library

2010-02-24 Thread Andrew Hankinson
I'd like to announce the release of a Python library for dealing with BagIt 
folder structures. This can be used either as a python library or at the 
command-line interface. It conforms to the 0.96 version of the BagIt spec.

http://github.com/ahankinson/pybagit

Documentation can be found here:

http://www.musiclibs.net/pybagit/

Some features of this library:
 - Can validate bag structures with both MD5 and SHA1 checksums
 - Can compress and uncompress bag structures in both .zip and .tgz formats
 - Can create and parse fetch file contents

I've included some code examples and some unit tests in the package to 
illustrate some of uses.

This is the first release of the software, so if you find any problems please 
let me know. It's being released under the MIT License and a Creative-Commons 
Attribution-only copyright, so please feel free to use it as you see fit.

I'd also like to send a nod out to Ed Summer's Python BagIt library, 
(http://github.com/edsu/bagit) which I just found this morning in preparing to 
write this email. Sorry for the duplication! When I started this project the 
only thing I could find was an older incomplete implementation by the LoC. If 
there's an interest in combining the two projects I would be more than willing 
to do so.

Cheers,
-Andrew


Re: [CODE4LIB] Fwd: [NGC4LIB] Integrating with your ILS through Web services and APIs

2009-07-23 Thread Andrew Hankinson
I think that it's supposed to be the exact opposite. APIs, and  
especially web APIs, exist to provide access to your data outside of a  
specific vendor implementation. That way you can have your OPAC from  
$BIG_VENDOR, but use the bibliographic data from it in  
$OPEN_SOURCE_PROJECT without having to access the database directly  
(causing your DBAs to have sleepless nights.) Theoretically anything  
that comes over HTTP is some form of structured text, so there  
shouldn't be unreadable binary blobs.


In theory, that's how it's supposed to work. There's still crappy  
implementations, but that's not the fault of the API - that's the  
fault of the implementer.


-Andrew

On 23-Jul-09, at 9:22 AM, Eric Lease Morgan wrote:


On Jul 22, 2009, at 10:56 PM, Ross Singer wrote:

...Today almost all ILS products make claims regarding offering  
more openness through APIs, Web services, and through a service- 
oriented architecture (SOA)


I heard someplace recently that APIs are the newest form of vendor  
lock-in.  What's your take?


--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
Hesburgh Libraries, University of Notre Dame

(574) 631-8604


Re: [CODE4LIB] Open, public standards v. pay per view standards and usage

2009-07-13 Thread Andrew Hankinson
Have a look at the ongoing battles between MPEG4 and Ogg for the  
browser video space. I don't know of your second criteria for b),  
however - not many people are using Ogg (yet)


http://www.roughlydrafted.com/2009/07/06/ogg-theora-h-264-and-the-html-5-browser-squabble/

http://arstechnica.com/open-source/news/2009/07/decoding-the-html-5-video-codec-debate.ars

-Andrew

On 13-Jul-09, at 12:22 PM, Walter Lewis wrote:


Are there any blindingly obvious examples of instances where
   a) a standards group produced a standard published by a body  
which charged for access to it

and
  b) a alternative standards groups produced a competing standard  
that was openly accessible
and the work of group a) was rendered totally irrelevant because  
most non-commercial work ignored it in favour of b).


My instinct is to quote the battle between OSI (ISO) and TCP/IP  
(IETF RFCs).  Does that strike others as appropriate?


Any examples closer to the library world?

Walter Lewis


Re: [CODE4LIB] Digital imaging questions

2009-06-18 Thread Andrew Hankinson
I'm pretty sure you can add extra fields to the dublin_core.xml file and
import it. I think I did something like this a few years ago, but I'm a bit
fuzzy on the details.
For the metadata creation, it might be worth your while to save the Excel
spreadsheet to a CSV file and then write a parser (in Python or Ruby) that
will read the values from that spreadsheet and produce a dublin_core.xml
file. If you gather the photo files in the same location,
you can then use the DSpace bulk importer to import everything into
your collection.

See here:
http://www.tdl.org/wp-content/uploads/2009/04/DSpaceBatchImportFormat.pdf

You may be able to add extra fields to the search index. See here:
http://wiki.dspace.org/index.php/Modify_search_fields

On Thu, Jun 18, 2009 at 1:32 PM, Deng, Sai  wrote:

> Andrew and Yan,
> Thanks for the reply and the information!
>
> About DSpace metadata registry, we can add new schema or new elements to
> it, but the elements won’t be searchable, right? (We can change the
> input-forms.xml to make it display in the submission workflow if we will
> have item by item submission.)
>
> In our case, we already have the herbarium metadata in excel sheet created
> by Biology Dept. They are now in loose Darwin Core and kind of free style.
> If I would like to do data transformation (transform it to a mixture of DC
> and Darwin Core possibly) and batch import the xml to DSpace, how to
> proceed? Where should I add the Darwin Core metadata (in the dublin_core.xml
> as well)? It seems that it only has dcvalue element.
>
> Sai
>
> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Andrew Hankinson
> Sent: Thursday, June 18, 2009 11:03 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Digital imaging questions
>
> Hi Sai,
> "Archival Quality Images" has some meaning, but it might be helpful to look
> up a standard and start your investigation for a new camera based on the
> recommendations of that standard. You might find this page from the Library
> of Congress helpful:
>
> http://www.digitalpreservation.gov/formats/content/still.shtml
>
> I think your indication that RAW/TIFF is a pretty safe bet, but being able
> to point to an actual standard might make your case for a new camera a bit
> more convincingly.  Other factors to take into account (other than
> megapixels and format) are color reproduction, image 'noise'
> specifications,
> DPI, lighting, (and probably many other things).
>
> For DSpace you don't even need to map the elements of Dublin Core to
> DarwinCore. Dspace has the ability to input different schema in its
> metadata
> registry. You can then modify the "inputforms.xml" file in the Dspace
> config
> directory to add the appropriate fields for the additional metadata fields.
>
> Hope this helps!
> -Andrew
>
> On Thu, Jun 18, 2009 at 10:33 AM, Deng, Sai  wrote:
>
> > Hi, list,
> >
> >
> >
> > A while ago, I read some interesting discussion on how to use camera to
> > produce archival-quality images from this list. Now, I have some imaging
> > questions and I think this might be a good list to turn to. Thank you in
> > advance! We are trying to add some herbarium images to our DSpace. The
> > specimen pictures will be taken at the Biology department and the library
> is
> > responsible for depositing the images and transferring/mapping/adding
> > metadata. On the testing stage, they use Fujifilm FinePix S8000fd digital
> > camera
> >
> > (
> >
> http://www.fujifilmusa.com/support/ServiceSupportSoftwareContent.do?dbid=874716&prodcat=871639&sscucatid=664260
> ).
> > It produces 8 megapixel images, and it doesn't have raw/tiff support. It
> > seems that it cannot produce archival quality images. Before we persuade
> the
> > Biology department to switch their camera, I want to make sure it is
> > absolutely necessary. The pictures they took look fine with human eyes,
> see
> > an example at:
> >
> http://library.wichita.edu/techserv/test/herbarium/Astenophylla1-02710.jpg
> >
> > In order to make master images from a camera, it should be capable of
> > producing raw or tiff images with 12 or above megapixels?
> >
> >
> >
> > A related archiving question, the biology field standard is DarwinCore,
> > however, DSpace doesn't support it. The Biology Dept. already has some
> data
> > in spreadsheet. In this case, when it is impossible to map all the
> elements
> > to Dublin Core, is it a good practice for us to set up several local
> > elements mapped from DarwinCore?
> >
> > Thanks a million,
> >
> > Sai
> >
> >
> > Sai Deng
> > Metadata Catalog Librarian
> > Wichita State University Libraries
> > 1845 Fairmount
> > Wichita, KS 67260-0068
> > Phone: (316) 978-5138
> > Fax:   (316) 978-3048
> > Email: sai.d...@wichita.edu
> > said...@gmail.com
> >
>


Re: [CODE4LIB] Digital imaging questions

2009-06-18 Thread Andrew Hankinson
Hi Sai,
"Archival Quality Images" has some meaning, but it might be helpful to look
up a standard and start your investigation for a new camera based on the
recommendations of that standard. You might find this page from the Library
of Congress helpful:

http://www.digitalpreservation.gov/formats/content/still.shtml

I think your indication that RAW/TIFF is a pretty safe bet, but being able
to point to an actual standard might make your case for a new camera a bit
more convincingly.  Other factors to take into account (other than
megapixels and format) are color reproduction, image 'noise' specifications,
DPI, lighting, (and probably many other things).

For DSpace you don't even need to map the elements of Dublin Core to
DarwinCore. Dspace has the ability to input different schema in its metadata
registry. You can then modify the "inputforms.xml" file in the Dspace config
directory to add the appropriate fields for the additional metadata fields.

Hope this helps!
-Andrew

On Thu, Jun 18, 2009 at 10:33 AM, Deng, Sai  wrote:

> Hi, list,
>
>
>
> A while ago, I read some interesting discussion on how to use camera to
> produce archival-quality images from this list. Now, I have some imaging
> questions and I think this might be a good list to turn to. Thank you in
> advance! We are trying to add some herbarium images to our DSpace. The
> specimen pictures will be taken at the Biology department and the library is
> responsible for depositing the images and transferring/mapping/adding
> metadata. On the testing stage, they use Fujifilm FinePix S8000fd digital
> camera
>
> (
> http://www.fujifilmusa.com/support/ServiceSupportSoftwareContent.do?dbid=874716&prodcat=871639&sscucatid=664260).
> It produces 8 megapixel images, and it doesn't have raw/tiff support. It
> seems that it cannot produce archival quality images. Before we persuade the
> Biology department to switch their camera, I want to make sure it is
> absolutely necessary. The pictures they took look fine with human eyes, see
> an example at:
> http://library.wichita.edu/techserv/test/herbarium/Astenophylla1-02710.jpg
>
> In order to make master images from a camera, it should be capable of
> producing raw or tiff images with 12 or above megapixels?
>
>
>
> A related archiving question, the biology field standard is DarwinCore,
> however, DSpace doesn't support it. The Biology Dept. already has some data
> in spreadsheet. In this case, when it is impossible to map all the elements
> to Dublin Core, is it a good practice for us to set up several local
> elements mapped from DarwinCore?
>
> Thanks a million,
>
> Sai
>
>
> Sai Deng
> Metadata Catalog Librarian
> Wichita State University Libraries
> 1845 Fairmount
> Wichita, KS 67260-0068
> Phone: (316) 978-5138
> Fax:   (316) 978-3048
> Email: sai.d...@wichita.edu
> said...@gmail.com
>


Re: [CODE4LIB] Code4lib mugs?

2008-11-06 Thread Andrew Hankinson
Long-time lurker, but thought I'd chime in and say I would be  
interested in such a scholarship, if it were available. I have a bit  
of video editing experience, and am interested in coming to the  
conference in RI.



On 3 Nov 2008, at 19:57, K.G. Schneider wrote:


+1 for the idea of funding the audio/video (and I always need more
travel mugs, but I'd rather have the a/v :> )

Karen

On Mon, 3 Nov 2008 16:24:10 -0500, "Jonathan Rochkind"
<[EMAIL PROTECTED]> said:
Aha, funding the audio and video is a great idea. Meets Code4Lib  
needs,

and also meets sponsor advertising needs, because all the videos and
audio could go up with a "capture of this content was sponsored by
Insert Vendor Here" link. I think Bill's idea is great.  Someone  
would

still need to be found to volunteer to recruit and supervise this
hypothetical student.

Jonathan

William Denton wrote:

On 3 November 2008, Jonathan Rochkind wrote:


Yeah, I'd rather the money were spent for a scholarship than for a
travel mug. I don't need any more travel mugs. Thanks for making  
this

point, Erik.


It'd be nice if there was a box of them for people that need one,  
but

I already have all the travel mugs I want.

Funding someone's attendance--or paying a student to get the audio  
and

video online quickly--would be great.

Bill


--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu


__
Andrew Hankinson, BMus, MLIS
PhD Student
Distributed Digital Music Archives and Libraries Lab
Schulich School of Music
McGill University, Montreal, QC

[EMAIL PROTECTED]
(H) 514.692.6726
(W) 514.398.4535 x0300
(F) 514.398.8061


Re: [CODE4LIB] what's friendlier & less powerful than phpMyAdmin?

2008-08-10 Thread andrew . hankinson
The Django framework's Administration interface is pretty good for doing
quick database work, and it's highly customizable.  It also does very basic
database introspection on existing databases to help get you set up.
-Andrew

On Wed, Jul 30, 2008 at 11:28 AM, Ken Irwin <[EMAIL PROTECTED]> wrote:

> Shawn Boyette ☠ wrote:
>
>> I don't think he was asking about *programmers* creating or modifying
>> *schema*.
>>
>
> It's true -- I just want a simple little data entry tool (which I've got
> now! That was easy.)
>
> I've been doing all of my development by hand, without the luxury of
> frameworks, not out of any programmerly virtue, but just out of simplicity
> -- ie, I've not taken the time to learn about frameworks. It sure would be
> nice to take the time at some point, and I'll keep Tim's injunctions about
> abstraction in mind when I do.
>
> *thanks and joy*
> Ken
>
>  On Wed, Jul 30, 2008 at 11:07 AM, Tim Spalding <[EMAIL PROTECTED]>
>> wrote:
>>
>>
>>> This gets religious quickly, but, in my experience, programmers who
>>> learn on a framework miss out on their understanding of database
>>> necessities. They may not matter much when you have a low-traffic,
>>> low-content situation, but as your traffic and data grow you're going
>>> to want an understanding of how MySQL optimizes queries, what's
>>> expensive and what's not, and so forth. Although anyone can learn
>>> anything, experience is the best teacher, and, in my experience,
>>> frameworks encourage you to avoid that experience.
>>>
>>> For example, the Ruby programmers I've worked with have been unaware
>>> that MySQL only uses one index per table per select, causing them to
>>> index far more than they need, how joins work across different MySQL
>>> data types, the advantages of ganging your inserts together, etc. This
>>> stuff adds up fast.
>>>
>>> Of course, the same arguments could be leveled against PHP in favor of
>>> C, against C in favor of assembly, etc.. Abstraction always has merits
>>> and demerits.
>>>
>>> Tim
>>>
>>>
>>
>>
> --
> Ken Irwin
> Reference Librarian
> Thomas Library, Wittenberg University
>


Re: [CODE4LIB] marcdb help

2008-02-25 Thread Andrew Hankinson

Hey Ed,

Sure thing!  Although I suspect it's a bit late tonight, and you're
all out partying in Portland!  ;)

I'll pop in tomorrow.

Thanks again,
Andrew

On 25-Feb-08, at 10:33 AM, Ed Summers wrote:


Heya Andrew -- this is pretty odd. Do you use irc much? Could you pop
into irc://chat.freenode.net/code4lib and maybe we can try to trouble
shoot it on there?

//Ed

On Sun, Feb 24, 2008 at 2:17 PM, Andrew Hankinson
<[EMAIL PROTECTED]> wrote:

Hi folks,

I'm wondering if someone can give me a hand with getting MarcDB
running.

I'm trying this with Python 2.5, SQLAlchemy 0.3-svn, and the latest
versions of pymarc & marcdb.  I've tried it on both OS X (Leopard &
Tiger) and FreeBSD 7 RC1, with the same results.  I'm also using
Postgres 8.2.6 as the database backend, but I've also tried this with
MySQL & Sqlite3.

When I run 'marcdb create postgres://username:[EMAIL PROTECTED]/
database', the command does not return an error, but does nothing to
the database. (it exists, with no tables)

Afterwards, I run 'marcdb load-xml marc-xml-file.xml postgres://
username:[EMAIL PROTECTED]/database' and it gives this error:

1
Traceback (most recent call last):
  File "/usr/local/bin/marcdb", line 5, in 
pkg_resources.run_script('marcdb==0.6', 'marcdb')
  File "build/bdist.linux-i686/egg/pkg_resources.py", line 448, in
run_script
  File "build/bdist.linux-i686/egg/pkg_resources.py", line 1173, in
run_script
  File "/usr/local/bin/marcdb", line 32, in 

  File "build/bdist.freebsd-7.0-RC1-i386/egg/marcdb/loader.py", line
13, in load_xml
  File "build/bdist.freebsd-7.0-RC1-i386/egg/pymarc/marcxml.py", line
75, in parse_xml
  File "/usr/local/lib/python2.5/xml/sax/expatreader.py", line 107,
in parse
xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/lib/python2.5/xml/sax/xmlreader.py", line 123, in
parse
self.feed(buffer)
  File "/usr/local/lib/python2.5/xml/sax/expatreader.py", line 207,
in feed
self._parser.Parse(data, isFinal)
  File "/usr/local/lib/python2.5/xml/sax/expatreader.py", line 349,
in end_element_ns
self._cont_handler.endElementNS(pair, None)
  File "build/bdist.freebsd-7.0-RC1-i386/egg/pymarc/marcxml.py", line
44, in endElementNS
  File "build/bdist.freebsd-7.0-RC1-i386/egg/marcdb/loader.py", line
47, in process_record
AttributeError: 'Record' object has no attribute 'control_fields'

Any thoughts on what might be causing this error?  If it's worth
mentioning, I'm trying to load Simon Spiro's name authority records
into a database.

Thanks in advance,
Andrew



[CODE4LIB] marcdb help

2008-02-24 Thread Andrew Hankinson

Hi folks,

I'm wondering if someone can give me a hand with getting MarcDB running.

I'm trying this with Python 2.5, SQLAlchemy 0.3-svn, and the latest
versions of pymarc & marcdb.  I've tried it on both OS X (Leopard &
Tiger) and FreeBSD 7 RC1, with the same results.  I'm also using
Postgres 8.2.6 as the database backend, but I've also tried this with
MySQL & Sqlite3.

When I run 'marcdb create postgres://username:[EMAIL PROTECTED]/
database', the command does not return an error, but does nothing to
the database. (it exists, with no tables)

Afterwards, I run 'marcdb load-xml marc-xml-file.xml postgres://
username:[EMAIL PROTECTED]/database' and it gives this error:

1
Traceback (most recent call last):
  File "/usr/local/bin/marcdb", line 5, in 
pkg_resources.run_script('marcdb==0.6', 'marcdb')
  File "build/bdist.linux-i686/egg/pkg_resources.py", line 448, in
run_script
  File "build/bdist.linux-i686/egg/pkg_resources.py", line 1173, in
run_script
  File "/usr/local/bin/marcdb", line 32, in 

  File "build/bdist.freebsd-7.0-RC1-i386/egg/marcdb/loader.py", line
13, in load_xml
  File "build/bdist.freebsd-7.0-RC1-i386/egg/pymarc/marcxml.py", line
75, in parse_xml
  File "/usr/local/lib/python2.5/xml/sax/expatreader.py", line 107,
in parse
xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/lib/python2.5/xml/sax/xmlreader.py", line 123, in
parse
self.feed(buffer)
  File "/usr/local/lib/python2.5/xml/sax/expatreader.py", line 207,
in feed
self._parser.Parse(data, isFinal)
  File "/usr/local/lib/python2.5/xml/sax/expatreader.py", line 349,
in end_element_ns
self._cont_handler.endElementNS(pair, None)
  File "build/bdist.freebsd-7.0-RC1-i386/egg/pymarc/marcxml.py", line
44, in endElementNS
  File "build/bdist.freebsd-7.0-RC1-i386/egg/marcdb/loader.py", line
47, in process_record
AttributeError: 'Record' object has no attribute 'control_fields'

Any thoughts on what might be causing this error?  If it's worth
mentioning, I'm trying to load Simon Spiro's name authority records
into a database.

Thanks in advance,
Andrew


Re: [CODE4LIB] LCC classifications in XML

2007-08-28 Thread Andrew Hankinson
I'd certainly be interested in that too.

On 8/28/07, Jonathan Brinley <[EMAIL PROTECTED]> wrote:
>
> Not long ago, I recall Ed Summers sharing the classification outline
> in RDF. I may still have a copy of that around if you're interest.
>
> Have a nice day,
> Jonathan
>
>
> > On 8/28/07 12:16 PM, "Andrew Nagy" <[EMAIL PROTECTED]> wrote:
> >
> > > Does anyone know of a place where the LCC Callnumber classifications
> can be
> > > found in a "parseable" format such as XML?
> > >
>
>
> --
> Jonathan M. Brinley
>
> [EMAIL PROTECTED]
> http://xplus3.net/
>


Re: [CODE4LIB] parse an OAI-PHM response

2007-07-30 Thread Andrew Hankinson
I haven't heard the words "Gopher" and "Killer App" used in the same
sentence for a looong time.
Thanks!

On 7/30/07, Tim Shearer <[EMAIL PROTECTED]> wrote:
>
> Depending on how locked down the php.ini file is (lots of good reasons to
> do this) you might look into curl.
>
> http://curl.haxx.se/
>
> Curl can work in php.
>
> http://us.php.net/curl
>
> It talks lots of protocols (including https, which is how I got on board),
> including gopher for any killer apps you have planned.
>
> -t
>
> +++
> Tim Shearer
>
> Web Development Coordinator
> The University Library
> University of North Carolina at Chapel Hill
> [EMAIL PROTECTED]
> 919-962-1288
> +++
>
>
>
> On Fri, 27 Jul 2007, John McGrath wrote:
>
> > You could use either the PEAR HTTP_Request package, or the built-in
> > fopen/fread commands, which can make http calls in addition to
> > opening local files. The easiest way, though, in my opinion, is
> > file_get_contents, which automatically dumps the response into a
> > String object. And it's fast, apparently:
> >
> > http://us.php.net/manual/en/function.file-get-contents.php
> >
> > http://pear.php.net/package/HTTP_Request
> > http://us.php.net/fopen
> > http://us.php.net/manual/en/function.fread.php
> >
> > Best,
> > John
> >
> > On Jul 27, 2007, at 9:31 PM, Andrew Hankinson wrote:
> >
> >> Hi folks,
> >> I'm wanting to implement a PHP parser for an OAI-PMH response from our
> >> Dspace installation.  I'm a bit stuck on one point: how do I get
> >> the PHP
> >> script to send a request to the OAI-PMH server, and get the XML
> >> response in
> >> return so I can then parse it?
> >>
> >> Any thoughts or pointers would be appreciated!
> >>
> >> Andrew
>


[CODE4LIB] parse an OAI-PHM response

2007-07-27 Thread Andrew Hankinson
Hi folks,
I'm wanting to implement a PHP parser for an OAI-PMH response from our
Dspace installation.  I'm a bit stuck on one point: how do I get the PHP
script to send a request to the OAI-PMH server, and get the XML response in
return so I can then parse it?

Any thoughts or pointers would be appreciated!

Andrew