Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-24 Thread pkeane


This conversation about Atom is, I think, really an important one to have.
As well designed and thought out as protocols  standards such as OAI-PMH,
METS (and the budding OAI-ORE spec) are, they don't have that viral
technology attribute of utter simplicity.  Sure there are trade-offs, but
the tool support and interoperability on a much larger scale that Atom
could provide cannot be denied.  I, too, have pondered the possibility of
Atom ( AtomPub for writing back) as a simpler replacement for all sorts
of similar technologies (METS, OAI-PMH, WebDAV, etc.) --
http://efoundations.typepad.com/efoundations/2007/07/app-moves-to-pr.html.
The simple fact that Google has standardized all of its web services on
GData (a flavor of Atom) cannot be ignored.

I have had some very interesting discussions over on atom-syntax about
thoroughly integrating Atom as a standard piece of infrastructure in a
large digital library project here at UT Austin (daseproject.org), and
while I don't necessarily think it provide a whole lot of benefit as an
internal data transfer mechanism, I see numerous advantages to
standardizing on Atom for any number of outward-facing
services/end-points. I think it would be sad if Atom and AtomPub were seen
only as technologies used by and for blogs/blogging.

Also, re: blog mirroring, I highly recommend the current discussions
floating aroung the blogosphere regarding distributed source control (Git,
Mercurial, etc.).  It's a fundamental paradigm shift from centralized
control to distributed control that points the way toward the future of
libraries as they (we) become less and less the gatekeepers for the
stuff be it digital or physical and more and more the facilitators of
the bidirectional replication that assures ubiquitous access and
long-term preservation.  The library becomes (actually it has already
happended) simply a node on a network of trust and should act accordingly.

See the thoroughly entertaining/thought-provoking Google tech talk by
Linus Torvalds on Git:  http://www.youtube.com/watch?v=4XpnKHJAok8

-peter keane
daseproject.org

On Tue, 23 Oct 2007, Jakob Voss wrote:


Hi Ed,

You wrote:


I completely agree.  When developing software it's really important to
focus on the cleanest/clearest solution, rather than getting bogged
down in edge cases and the comments from nay sayers. I hope that my
response didn't come across that way.


:-)


A couple follow on questions for you:

In your vision for this software are you expecting that content
providers would have to implement RFC 5005 for your archiving system
to work?


Probably yes - at least for older entries. New posts can also be
collected with the default feeds. Instead of working out exceptions and
special solutions how to get blog archives with other methods you should
provide RFC 5005 plugins for common blog software like Wordpress and
advertise its use (We are sorry - the blog that you asked to archive
does not support RFC 5005 so we can only archive new postings. Please
ask its provider to implement archived feeds so we can archive the
postings before {TIMESTAMP}. More information and plugins for RFC 5005
can be found {HERE}. Thank you!).


Are you considering archiving media files associated with a blog entry
(images, sound, video, etc?).


Well, it depends on. There are hundreds of ways to associate media files
- I doubt that you can easily archive YouTube and SlideShare widgets
etc. but images included with img src=.../ should be doable. However
I prefer iterative developement - if basic archiving works, you can
start to think about media files. By the way I would value more the
comments - which are also additional and non trivial to archive.

To begin with, a WordPress plugin is surely the right step. Up to now
RFC 5005 is so new that noone implemented it yet although its not
complicated.

Greetings,
Jakob

--
Jakob Voß [EMAIL PROTECTED], skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-24 Thread Clay Redding

Hi Peter,

I completely agree with everything you just wrote, especially about
Atom + APP being more than just a technology for blogs.  APP is a
great lightweight alternative to WebDAV, and promising for all sorts
of data transfer.  The fact that it has developer groundswell is a
huge plus.  During my Princeton days Kevin Clarke and I briefly
talked about what a METS + APP metadata editing application could
do.  (I can't remember the answer, but I bet it would be snazzy.)

To stay on the OAI theme, I sometimes wish the activity of sharing
metadata used a push technology like APP instead of the OAI pull/
harvest approach that we use today.   One of the reasons is that I
feel it would be easier for the content providers to achieve deletes
via HTTP DELETE for deleted record behavior, simply because the
content providers would know to whom they PUT or POSTed their
metadata.  Service providers wouldn't have to support deleted
records, they'd just have to reindex.

I came to this realization out of frustration that most OAI toolkits
(at the time, ca. 2005) didn't support that functionality well -- or
at all.  I don't know if that's still the case.  However, the need to
delete records is a reality for most projects, and OAI has somewhat
awkwardly made us rethink how to delete a record in repositories
and the like, both on the service and data provider end.   You almost
have to build your entire system around handling deleted records
just for OAI exposure.   In reality it seems like you just end up
masquerading or re-representing its outward visibility on our local
systems, which gets onerous.

I guess the difference is that the growing number of Atom developers
are heeding the requirement for deletions, whereas the few existing
OAI toolkit developers have deemed that functionality as optional.

Long winded as usual,
Clay

On Oct 24, 2007, at 12:51 AM, pkeane wrote:



This conversation about Atom is, I think, really an important one
to have.
As well designed and thought out as protocols  standards such as
OAI-PMH,
METS (and the budding OAI-ORE spec) are, they don't have that viral
technology attribute of utter simplicity.  [snipped]



I see numerous advantages to
standardizing on Atom for any number of outward-facing
services/end-points. I think it would be sad if Atom and AtomPub
were seen
only as technologies used by and for blogs/blogging.



[CODE4LIB] OpenContent SRU search of OAISter, weirdness?

2007-10-24 Thread Jonathan Rochkind

I'm messing with SRU search of http://indexdata.dk/opencontent/oaister

I have some behavior I can't explain. There's this article that is in
OAISter, called Resurrection and Appropriation: Reputational
Trajectories, Memory Work, and the Political Use of Historical Figures
by Robert S. Jensen.

I do an SRU search with query:
dc.title = Resurrection and Appropriation: Reputational Trajectories,
Memory Work, and the Political Use of Historical Figures
And I find the record, one hit. Good. You too could try, and see what
the DC returned looks like. It does have a dc:creator of Robert S. Jensen.

But I try a search that includes the author.
dc.title = Resurrection and Appropriation: Reputational Trajectories,
Memory Work, and the Political Use of Historical Figures and dc.creator
= Jensen

0 hits.
and cql.serverChoice = Jensen= 0 hits

Same using full name Robert S. Jensen (just as it appears in the
record), with cql.serverChoice or dc.creator.

Is this just a bad index, or is something else going on, or what?  As I
try sample searches on title and author, I keep running into false
negatives for things that ought to be in the OAISter index. Sometimes I
can figure out why (title not quite right; title has curly quotes, index
does not, etc.), but in this case I have no idea. But the net result is
it's hard to actually find your known item this way, via an automated
search on known item metadata.

Jonathan


--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu