Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Ben O'Steen
What happens if the main doi resolver goes down? I'd be interested to see
how well a local resolver works when blocked from this upstream server. Are
there any other upstream servers?

Ben

On Nov 23, 2009 10:10 PM, "Tom Keays"  wrote:

Interesting stuff. I never really thought about it before that DOIs
can be served up by the Handle server. E.G.,

http://dx.doi.org/10.1074/jbc.M004545200 <=>

http://hdl.handle.net/10.1074/jbc.M004545200
But, even more surprising to me was realizing that Handles can be
resolved by the DOI server. Or presumably any DOI server.

http://hdl.handle.net/2027.42/46087 <=> http://dx.doi.org/2027.42/46087

I suppose I should have understood this point since the Handle service
does sort of obliquely say this.

http://www.handle.net/factsheet.html

Anyway, good to have it made explicit.

Tom

On Mon, Nov 23, 2009 at 4:03 PM, Jonathan Rochkind  wrote:
> The actual "handle" ...


Re: [CODE4LIB] Durability of PDFs

2009-06-15 Thread Ben O'Steen
Jonathon,

Likewise that paragraph reads with the same accuracy with the
following alterations

s/UTF-8|Unicode/PDF/
s/encoding/version/

I think the key thing is that garbage in == garbage out, but I feel
happier with garbage that was meant to have been unicode at some
point, compared to a pdf that was made by a Word->PDF printer driver
that craps out on large files, but does so silently.

My experiences with PDF versions:

PDF 1.3 and earlier is evil, 1.4 not too bad aside from colour issues
and its ham-fisted way of attempting to shove CMYK info into itself,
1.6 has issues to some people that I am still trying to isolate, 1.7
is rare as hens teeth and PDF/A as a spec seems to be okay, but I've
only seen a few of those in the wild and only from OpenOffice too. It
would be interesting to see how OOo's idea of PDF/A stacks against
Adobe's.

And there is PDF/X(-3?) orsimilar which I've only even seen on an
options panel, before being swiftly ignored.

And on a final note, there have been PDF files that are useless to me,
I can't wheedle out anything from them, and that are only 10 years
old. However, I have resurrected a tex-based thesis from an earlier
period without difficulty, and created a PDF/A from the source.

Bottom line is that it's best to preserve the source materials as well
as the final disseminations - you can't always guarantee a viewer will
work as expected. The trend is that newer PDF versions are better, but
be very very wary of hidden DRM. If memory serves, an eBook publisher
lost 1/4(?) of their stock, due to losing the mechanism to unlock.
Let's not have that happen to repositories...

Ben

2009/6/15 Jonathan Rochkind :
> Fair enough.  Asking someone to give you a UTF-8 (or other Unicode encoding)
> plain text file though -- you better try to heuristically check the encoding
> before ingesting it, and plan on a lot of failures. Typical users using
> typical consumer software (which tends to be somewhat unpredictable with
> character encodings) can't be trusted to give you a UTF-8 encoding just
> because you specify it, or  to have any idea what this means or how to do
> it.
> And checking the to see if the 'true' encoding of a plain text file is what
> it's advertised as in an automated fashion is heuristic at best, and not
> going to be perfect.
> And you're still going to have trouble with complicated mathematical
> formulas, molecular diagrams, other diagrams, etc.
>
> Jonathan
>
> Doran, Michael D wrote:
>>>
>>> As far as electronic formats go, I think PDF is as good as anything --
>>> except maybe plain ASCII text, which is not
>>> nearly as useable (and doesn't allow diagrams,
>>> mathematical equations, non-English letters, etc).
>>>
>>
>> There is no requirement that plain text be limited to the ASCII character
>> set repertoire.  Although once they were almost synonymous, that is no
>> longer the case [1].  Plain text can encompass anything and everything in
>> the Unicode character set.  That includes non-Roman scripts, mathematical
>> symbols, yada, yada, yada.
>>
>> -- Michael
>>
>> [1] http://en.wikipedia.org/wiki/Plain_text
>>
>> # Michael Doran, Systems Librarian
>> # University of Texas at Arlington
>> # 817-272-5326 office
>> # 817-688-1926 mobile
>> # do...@uta.edu
>> # http://rocky.uta.edu/doran/
>>
>>
>>>
>>> -Original Message-
>>> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
>>> Jonathan Rochkind
>>> Sent: Monday, June 15, 2009 9:13 AM
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> Subject: Re: [CODE4LIB] Durability of PDFs
>>>
>>> The bet is that PDFs are so popular that _someone_ (the archival
>>> community if no-one else, but probably someone else) will ensure that they
>>> continue to be readable somehow.
>>>
>>> These are real non-trivial issues in electronic archiving though, issues
>>> that the archival community.  It is generally a safe assumption that good
>>> electronic archiving over the decades-and-more term requires some regular
>>> attention by an electronic archivist to make sure that files remain
>>> readable, and are converted to new formats when necessary. As well as
>>> attention to avoiding actual bit-level corruption of files. You can't
>>> neccesarily just dump files on a HD and ignore them and expect they'll be
>>> readable in 100 years, that much is true -- and true pretty much regardless
>>> of particular electronic format you choose.
>>>
>>> As far as electronic formats go, I think PDF is as good as anything --
>>> except maybe plain ASCII text, which is not nearly as useable (and doesn't
>>> allow diagrams, mathematical equations, non-English letters, etc). I don't
>>> know why you're colleague has decided that "30-40 years" is the horizon
>>> after which PDF specifically will become "unreadable", this seems like just
>>> a wild guess to me, but it would be interesting to see if he has any
>>> particular evidence to back up this claim.
>>> So there are real issues with electronic archiving, but unless they lead
>>>

Re: [CODE4LIB] Open Source Institutional Repository Software?

2008-08-22 Thread Ben O'Steen
If you are after a quick and easy proof of concept, that requires no
programming skill, then I would plump for EPrints.org - there's even a
debian package for it, which makes it really easy to install

2008/8/22 Edward M. Corrado <[EMAIL PROTECTED]>:
> Ben O'Steen wrote:
>>
>> 2008/8/22 David Kane <[EMAIL PROTECTED]>:
>>
>>>
>>> I use EPrints, which is great.
>>>
>>> Do look out for Microsoft's offering though, which is in the pipeline.
>>>  It
>>> will be free.  Of course It will need to run on a Windows server and will
>>> be
>>> optimised for SQL Server.
>>>
>>
>> Er.. it will *only* run on the most recent SQLServer - their repo is a
>> shim on top of SQLServers native XPath handling system, as far as I
>> could make out. From talking to them, they seem to have convinced a
>> SQLServer that it is really an RDF triplestore, and have added a few
>> gui things on top.
>>
>> My actual advice to the first poster however, is that if you have no
>> resources (money and/or time) then don't bother with an IR - it will
>> require time, effort and probably money. If you feel up to putting in
>> that time and effort, then fantastic, but don't be mislead that it
>> won't suck up your time.
>>
>>
>
> I should have clarified... we have time to support and grow it, but not time
> to program it (since we don't have any programmers). Also, we plan on having
> money in the future, but we need to get something up and running now in
> order to convince the people with money to provide it to us for this
> purpose. Kind of a chicken and an egg thing.
>
> Edward
>
>
>
>> Ben O'Steen
>> Software Engineer,
>> Oxford University
>>
>>
>>>
>>> David.
>>>
>>>
>>> --
>>> David Kane
>>> Systems Librarian
>>> Waterford Institute of Technology
>>> http://library.wit.ie/
>>> T: ++353.51302838
>>> M: ++353.876693212
>>>
>>>
>


Re: [CODE4LIB] Open Source Institutional Repository Software?

2008-08-22 Thread Ben O'Steen
2008/8/22 David Kane <[EMAIL PROTECTED]>:
> I use EPrints, which is great.
>
> Do look out for Microsoft's offering though, which is in the pipeline.  It
> will be free.  Of course It will need to run on a Windows server and will be
> optimised for SQL Server.

Er.. it will *only* run on the most recent SQLServer - their repo is a
shim on top of SQLServers native XPath handling system, as far as I
could make out. From talking to them, they seem to have convinced a
SQLServer that it is really an RDF triplestore, and have added a few
gui things on top.

My actual advice to the first poster however, is that if you have no
resources (money and/or time) then don't bother with an IR - it will
require time, effort and probably money. If you feel up to putting in
that time and effort, then fantastic, but don't be mislead that it
won't suck up your time.

Ben O'Steen
Software Engineer,
Oxford University

>
> David.
>
>
> --
> David Kane
> Systems Librarian
> Waterford Institute of Technology
> http://library.wit.ie/
> T: ++353.51302838
> M: ++353.876693212
>