Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-25 Thread Jakob Voss

Hi Clay,


I completely agree with everything you just wrote, especially about
Atom + APP being more than just a technology for blogs.  APP is a
great lightweight alternative to WebDAV, and promising for all sorts
of data transfer.  The fact that it has developer groundswell is a
huge plus.  During my Princeton days Kevin Clarke and I briefly
talked about what a METS + APP metadata editing application could
do.  (I can't remember the answer, but I bet it would be snazzy.)


On the one side you are right: Atom + APP is becoming popular and the
standards are good, so digital libraries should get into it. On the
other side I was just reminded to the ECDL2006-paper Repository
Replication Using NNTP and SMTP: You can almost use any protocol (HTTP,
OAI, ATOM APP, WebDAV, NNTP...) for most of digital libraries' use cases
- but the best standard without approriate tools and support is pretty
worthless.


I came to this realization out of frustration that most OAI toolkits
(at the time, ca. 2005) didn't support that functionality well -- or
at all.  I don't know if that's still the case.  However, the need to
delete records is a reality for most projects, and OAI has somewhat
awkwardly made us rethink how to delete a record in repositories
and the like, both on the service and data provider end.   You almost
have to build your entire system around handling deleted records
just for OAI exposure.   In reality it seems like you just end up
masquerading or re-representing its outward visibility on our local
systems, which gets onerous.

I guess the difference is that the growing number of Atom developers
are heeding the requirement for deletions, whereas the few existing
OAI toolkit developers have deemed that functionality as optional.


Most repositories do not even track deletions so they cannot syndicate
them. If OAI-delete was mandatory, maybe OAI-PMH had not been used that
much? OAI did a good job in promoting and documenting OAI-PMH but
deletions were always treated as an orphan - I would not blame the
standard but the lacking implementation.

Also ATOM and RFC 5005 is not much better than other solutions - but its
much more likely to get it implemented in Weblog and other software then
OAI which is not that known outside the library world.

Greetings,
Jakob

P.S: Maybe we would all be happy with Z39.50 if we had that wonderful
Indexdata tools right from the beginning - instead there were only
closed source specifications and different closed source partial
implementations. A standard without easy to use open source
implementations is condemned to be violated and die.


--
Jakob Voß [EMAIL PROTECTED], skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-25 Thread Jakob Voss

Peter wrote:


Also, re: blog mirroring, I highly recommend the current discussions
floating aroung the blogosphere regarding distributed source control (Git,
Mercurial, etc.).  It's a fundamental paradigm shift from centralized
control to distributed control that points the way toward the future of
libraries as they (we) become less and less the gatekeepers for the
stuff be it digital or physical and more and more the facilitators of
the bidirectional replication that assures ubiquitous access and
long-term preservation.  The library becomes (actually it has already
happended) simply a node on a network of trust and should act accordingly.

See the thoroughly entertaining/thought-provoking Google tech talk by
Linus Torvalds on Git:  http://www.youtube.com/watch?v=4XpnKHJAok8


Thanks for pointing to this interesting discussion. This goes even
further then the current paradigm shift from the old model
(author - publisher - distributor - reader) to a world of
user-generated content and collaboration! I was glad if we finally got
to model and archive Weblogs and Wikis - modelling and archiving the
whole process of content copying, changing and remixing and
republication is far beyong libraries capabilities!

Greetings,
Jakob

--
Jakob Voß [EMAIL PROTECTED], skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


[CODE4LIB] Distributed Models the Library (was: Re: [CODE4LIB] RFC 5005 ATOM extension and OAI)

2007-10-25 Thread pkeane

Hi Jakob-

Yes, I think you are correct that it is a bit much to think that a
distributed archiving model is a bit much for libraries to even consider
now, but I do think there are useful insights to be gained here.

As it stands now, linux developers using Git can carry around the entire
change history of the linux kernel (well, I think they just included the
2.6 kernel when they moved to Git) on their laptop, make changes, create
patches, etc and then make that available to others.  Well, undoubtedly
change history is is a bit much for the library to think about, by why
not, for instance, and entire library catalog?  If I could check out the
library catalog onto my computer  use whatever tools I wished to search,
organize, annotate, etc., then perhaps mix-in data (say holdings data
from other that are near me) OR even create the sort of relationships
between records that the Open Library folks are talking about
(http://www.hyperorg.com/blogger/mtarchive/berkman_lunch_aaron_swartz_on.html)
then share that added data, we have quite a powerful distributed
development model.  It may seem a bit far-fetched, but I think that some
of the pieces (or at least a better understanding of how this might all
work) are beginning to take shape.

-Peter

On Thu, 25 Oct 2007, Jakob Voss wrote:


Peter wrote:


Also, re: blog mirroring, I highly recommend the current discussions
floating aroung the blogosphere regarding distributed source control (Git,
Mercurial, etc.).  It's a fundamental paradigm shift from centralized
control to distributed control that points the way toward the future of
libraries as they (we) become less and less the gatekeepers for the
stuff be it digital or physical and more and more the facilitators of
the bidirectional replication that assures ubiquitous access and
long-term preservation.  The library becomes (actually it has already
happended) simply a node on a network of trust and should act accordingly.

See the thoroughly entertaining/thought-provoking Google tech talk by
Linus Torvalds on Git:  http://www.youtube.com/watch?v=4XpnKHJAok8


Thanks for pointing to this interesting discussion. This goes even
further then the current paradigm shift from the old model
(author - publisher - distributor - reader) to a world of
user-generated content and collaboration! I was glad if we finally got
to model and archive Weblogs and Wikis - modelling and archiving the
whole process of content copying, changing and remixing and
republication is far beyong libraries capabilities!

Greetings,
Jakob

--
Jakob Voß [EMAIL PROTECTED], skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


[CODE4LIB] Distributed Models the Library (was: Re: [CODE4LIB] RFC 5005 ATOM extension and OAI)

2007-10-25 Thread Jason Stirnaman
 not, for instance, and entire library catalog?  If I could check out the
 library catalog onto my computer  use whatever tools I wished to search,

Peter,

You might be interested in Art Rhyno's experiment.  Here's Jon Udell's summary:

Art Rhyno’s science project
Art Rhyno’s title is Systems Librarian but he should consider adding Mad 
Scientist to his business card because his is full of wild and crazy and — to 
me, at least — brilliant ideas. Last year, when I was a judge for the Talis 
“Mashing up the Library” competion, one of my favorite entries was this one 
from Art. The project mirrors a library catalog to the desktop and integrates 
it with desktop search. The searcher in this case is Google Desktop, but could 
be another, and the integration is accomplished by exposing the catalog as a 
set of Web Folders, which Art correctly describes as “Microsoft’s in-built and 
oft-overlooked WebDAV option.”

http://blog.jonudell.net/2007/03/16/art-rhynos-science-project/

Jason
--

Jason Stirnaman
OME/Biomedical  Digital Projects Librarian
A.R. Dykes Library
The University of Kansas Medical Center
Kansas City, Kansas
Work: 913-588-7319
Email: [EMAIL PROTECTED]


 On 10/25/2007 at 10:47 AM, in message
[EMAIL PROTECTED], pkeane
[EMAIL PROTECTED] wrote:
 Hi Jakob-

 Yes, I think you are correct that it is a bit much to think that a
 distributed archiving model is a bit much for libraries to even consider
 now, but I do think there are useful insights to be gained here.

 As it stands now, linux developers using Git can carry around the entire
 change history of the linux kernel (well, I think they just included the
 2.6 kernel when they moved to Git) on their laptop, make changes, create
 patches, etc and then make that available to others.  Well, undoubtedly
 change history is is a bit much for the library to think about, by why
 not, for instance, and entire library catalog?  If I could check out the
 library catalog onto my computer  use whatever tools I wished to search,
 organize, annotate, etc., then perhaps mix-in data (say holdings data
 from other that are near me) OR even create the sort of relationships
 between records that the Open Library folks are talking about
 (http://www.hyperorg.com/blogger/mtarchive/berkman_lunch_aaron_swartz_on.htm
 l)
 then share that added data, we have quite a powerful distributed
 development model.  It may seem a bit far-fetched, but I think that some
 of the pieces (or at least a better understanding of how this might all
 work) are beginning to take shape.

 -Peter

 On Thu, 25 Oct 2007, Jakob Voss wrote:

 Peter wrote:

 Also, re: blog mirroring, I highly recommend the current discussions
 floating aroung the blogosphere regarding distributed source control (Git,
 Mercurial, etc.).  It's a fundamental paradigm shift from centralized
 control to distributed control that points the way toward the future of
 libraries as they (we) become less and less the gatekeepers for the
 stuff be it digital or physical and more and more the facilitators of
 the bidirectional replication that assures ubiquitous access and
 long-term preservation.  The library becomes (actually it has already
 happended) simply a node on a network of trust and should act accordingly.

 See the thoroughly entertaining/thought-provoking Google tech talk by
 Linus Torvalds on Git:  http://www.youtube.com/watch?v=4XpnKHJAok8

 Thanks for pointing to this interesting discussion. This goes even
 further then the current paradigm shift from the old model
 (author - publisher - distributor - reader) to a world of
 user-generated content and collaboration! I was glad if we finally got
 to model and archive Weblogs and Wikis - modelling and archiving the
 whole process of content copying, changing and remixing and
 republication is far beyong libraries capabilities!

 Greetings,
 Jakob

 --
 Jakob Voß [EMAIL PROTECTED], skype: nichtich
 Verbundzentrale des GBV (VZG) / Common Library Network
 Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
 +49 (0)551 39-10242, http://www.gbv.de



Re: [CODE4LIB] Distributed Models the Library (was: Re: [CODE4LIB] RFC 5005 ATOM extension and OAI)

2007-10-25 Thread pkeane

Very interesting!  I will check it out

-Peter

On Thu, 25 Oct 2007, Jason Stirnaman wrote:


not, for instance, and entire library catalog?  If I could check out the
library catalog onto my computer  use whatever tools I wished to search,


Peter,

You might be interested in Art Rhyno's experiment.  Here's Jon Udell's summary:

Art Rhyno?s science project
Art Rhyno?s title is Systems Librarian but he should consider adding Mad 
Scientist to his business card because his is full of wild and crazy and ? to 
me, at least ? brilliant ideas. Last year, when I was a judge for the Talis 
?Mashing up the Library? competion, one of my favorite entries was this one 
from Art. The project mirrors a library catalog to the desktop and integrates 
it with desktop search. The searcher in this case is Google Desktop, but could 
be another, and the integration is accomplished by exposing the catalog as a 
set of Web Folders, which Art correctly describes as ?Microsoft?s in-built and 
oft-overlooked WebDAV option.?

http://blog.jonudell.net/2007/03/16/art-rhynos-science-project/

Jason
--

Jason Stirnaman
OME/Biomedical  Digital Projects Librarian
A.R. Dykes Library
The University of Kansas Medical Center
Kansas City, Kansas
Work: 913-588-7319
Email: [EMAIL PROTECTED]



On 10/25/2007 at 10:47 AM, in message

[EMAIL PROTECTED], pkeane
[EMAIL PROTECTED] wrote:

Hi Jakob-

Yes, I think you are correct that it is a bit much to think that a
distributed archiving model is a bit much for libraries to even consider
now, but I do think there are useful insights to be gained here.

As it stands now, linux developers using Git can carry around the entire
change history of the linux kernel (well, I think they just included the
2.6 kernel when they moved to Git) on their laptop, make changes, create
patches, etc and then make that available to others.  Well, undoubtedly
change history is is a bit much for the library to think about, by why
not, for instance, and entire library catalog?  If I could check out the
library catalog onto my computer  use whatever tools I wished to search,
organize, annotate, etc., then perhaps mix-in data (say holdings data
from other that are near me) OR even create the sort of relationships
between records that the Open Library folks are talking about
(http://www.hyperorg.com/blogger/mtarchive/berkman_lunch_aaron_swartz_on.htm
l)
then share that added data, we have quite a powerful distributed
development model.  It may seem a bit far-fetched, but I think that some
of the pieces (or at least a better understanding of how this might all
work) are beginning to take shape.

-Peter

On Thu, 25 Oct 2007, Jakob Voss wrote:


Peter wrote:


Also, re: blog mirroring, I highly recommend the current discussions
floating aroung the blogosphere regarding distributed source control (Git,
Mercurial, etc.).  It's a fundamental paradigm shift from centralized
control to distributed control that points the way toward the future of
libraries as they (we) become less and less the gatekeepers for the
stuff be it digital or physical and more and more the facilitators of
the bidirectional replication that assures ubiquitous access and
long-term preservation.  The library becomes (actually it has already
happended) simply a node on a network of trust and should act accordingly.

See the thoroughly entertaining/thought-provoking Google tech talk by
Linus Torvalds on Git:  http://www.youtube.com/watch?v=4XpnKHJAok8


Thanks for pointing to this interesting discussion. This goes even
further then the current paradigm shift from the old model
(author - publisher - distributor - reader) to a world of
user-generated content and collaboration! I was glad if we finally got
to model and archive Weblogs and Wikis - modelling and archiving the
whole process of content copying, changing and remixing and
republication is far beyong libraries capabilities!

Greetings,
Jakob

--
Jakob Voß [EMAIL PROTECTED], skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de



Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-24 Thread pkeane


This conversation about Atom is, I think, really an important one to have.
As well designed and thought out as protocols  standards such as OAI-PMH,
METS (and the budding OAI-ORE spec) are, they don't have that viral
technology attribute of utter simplicity.  Sure there are trade-offs, but
the tool support and interoperability on a much larger scale that Atom
could provide cannot be denied.  I, too, have pondered the possibility of
Atom ( AtomPub for writing back) as a simpler replacement for all sorts
of similar technologies (METS, OAI-PMH, WebDAV, etc.) --
http://efoundations.typepad.com/efoundations/2007/07/app-moves-to-pr.html.
The simple fact that Google has standardized all of its web services on
GData (a flavor of Atom) cannot be ignored.

I have had some very interesting discussions over on atom-syntax about
thoroughly integrating Atom as a standard piece of infrastructure in a
large digital library project here at UT Austin (daseproject.org), and
while I don't necessarily think it provide a whole lot of benefit as an
internal data transfer mechanism, I see numerous advantages to
standardizing on Atom for any number of outward-facing
services/end-points. I think it would be sad if Atom and AtomPub were seen
only as technologies used by and for blogs/blogging.

Also, re: blog mirroring, I highly recommend the current discussions
floating aroung the blogosphere regarding distributed source control (Git,
Mercurial, etc.).  It's a fundamental paradigm shift from centralized
control to distributed control that points the way toward the future of
libraries as they (we) become less and less the gatekeepers for the
stuff be it digital or physical and more and more the facilitators of
the bidirectional replication that assures ubiquitous access and
long-term preservation.  The library becomes (actually it has already
happended) simply a node on a network of trust and should act accordingly.

See the thoroughly entertaining/thought-provoking Google tech talk by
Linus Torvalds on Git:  http://www.youtube.com/watch?v=4XpnKHJAok8

-peter keane
daseproject.org

On Tue, 23 Oct 2007, Jakob Voss wrote:


Hi Ed,

You wrote:


I completely agree.  When developing software it's really important to
focus on the cleanest/clearest solution, rather than getting bogged
down in edge cases and the comments from nay sayers. I hope that my
response didn't come across that way.


:-)


A couple follow on questions for you:

In your vision for this software are you expecting that content
providers would have to implement RFC 5005 for your archiving system
to work?


Probably yes - at least for older entries. New posts can also be
collected with the default feeds. Instead of working out exceptions and
special solutions how to get blog archives with other methods you should
provide RFC 5005 plugins for common blog software like Wordpress and
advertise its use (We are sorry - the blog that you asked to archive
does not support RFC 5005 so we can only archive new postings. Please
ask its provider to implement archived feeds so we can archive the
postings before {TIMESTAMP}. More information and plugins for RFC 5005
can be found {HERE}. Thank you!).


Are you considering archiving media files associated with a blog entry
(images, sound, video, etc?).


Well, it depends on. There are hundreds of ways to associate media files
- I doubt that you can easily archive YouTube and SlideShare widgets
etc. but images included with img src=.../ should be doable. However
I prefer iterative developement - if basic archiving works, you can
start to think about media files. By the way I would value more the
comments - which are also additional and non trivial to archive.

To begin with, a WordPress plugin is surely the right step. Up to now
RFC 5005 is so new that noone implemented it yet although its not
complicated.

Greetings,
Jakob

--
Jakob Voß [EMAIL PROTECTED], skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-24 Thread Clay Redding

Hi Peter,

I completely agree with everything you just wrote, especially about
Atom + APP being more than just a technology for blogs.  APP is a
great lightweight alternative to WebDAV, and promising for all sorts
of data transfer.  The fact that it has developer groundswell is a
huge plus.  During my Princeton days Kevin Clarke and I briefly
talked about what a METS + APP metadata editing application could
do.  (I can't remember the answer, but I bet it would be snazzy.)

To stay on the OAI theme, I sometimes wish the activity of sharing
metadata used a push technology like APP instead of the OAI pull/
harvest approach that we use today.   One of the reasons is that I
feel it would be easier for the content providers to achieve deletes
via HTTP DELETE for deleted record behavior, simply because the
content providers would know to whom they PUT or POSTed their
metadata.  Service providers wouldn't have to support deleted
records, they'd just have to reindex.

I came to this realization out of frustration that most OAI toolkits
(at the time, ca. 2005) didn't support that functionality well -- or
at all.  I don't know if that's still the case.  However, the need to
delete records is a reality for most projects, and OAI has somewhat
awkwardly made us rethink how to delete a record in repositories
and the like, both on the service and data provider end.   You almost
have to build your entire system around handling deleted records
just for OAI exposure.   In reality it seems like you just end up
masquerading or re-representing its outward visibility on our local
systems, which gets onerous.

I guess the difference is that the growing number of Atom developers
are heeding the requirement for deletions, whereas the few existing
OAI toolkit developers have deemed that functionality as optional.

Long winded as usual,
Clay

On Oct 24, 2007, at 12:51 AM, pkeane wrote:



This conversation about Atom is, I think, really an important one
to have.
As well designed and thought out as protocols  standards such as
OAI-PMH,
METS (and the budding OAI-ORE spec) are, they don't have that viral
technology attribute of utter simplicity.  [snipped]



I see numerous advantages to
standardizing on Atom for any number of outward-facing
services/end-points. I think it would be sad if Atom and AtomPub
were seen
only as technologies used by and for blogs/blogging.



Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-23 Thread Jakob Voss

Hi Ed,

You wrote:


I completely agree.  When developing software it's really important to
focus on the cleanest/clearest solution, rather than getting bogged
down in edge cases and the comments from nay sayers. I hope that my
response didn't come across that way.


:-)


A couple follow on questions for you:

In your vision for this software are you expecting that content
providers would have to implement RFC 5005 for your archiving system
to work?


Probably yes - at least for older entries. New posts can also be
collected with the default feeds. Instead of working out exceptions and
special solutions how to get blog archives with other methods you should
provide RFC 5005 plugins for common blog software like Wordpress and
advertise its use (We are sorry - the blog that you asked to archive
does not support RFC 5005 so we can only archive new postings. Please
ask its provider to implement archived feeds so we can archive the
postings before {TIMESTAMP}. More information and plugins for RFC 5005
can be found {HERE}. Thank you!).


Are you considering archiving media files associated with a blog entry
(images, sound, video, etc?).


Well, it depends on. There are hundreds of ways to associate media files
- I doubt that you can easily archive YouTube and SlideShare widgets
etc. but images included with img src=.../ should be doable. However
I prefer iterative developement - if basic archiving works, you can
start to think about media files. By the way I would value more the
comments - which are also additional and non trivial to archive.

To begin with, a WordPress plugin is surely the right step. Up to now
RFC 5005 is so new that noone implemented it yet although its not
complicated.

Greetings,
Jakob

--
Jakob Voß [EMAIL PROTECTED], skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-22 Thread Jakob Voss

Ed Summers wrote:


Thanks for posting this Jakob. I was just reading RFC 5005 on the
train yesterday (literally) and the parallels between it and OAI-PMH
struck me as well. It's not quite clear to me how deleted records
would be handled with an atom archive feed. But I guess one could
assume if the identifier is no longer present it has been deleted it.
But that would require pulling the entire archive... I'm not really
sure how much deletes are really used in OAI-PMH repositories anyhow.


OAI-PMH 1.1 was not clear enough on deletions but in 2.0 the
specification contains an example. I think the missing support of
deletions in data providers has to do with the missing explicit support
in service providers and vice versa (henn-and-egg-problem).


Stuart Weibel has written [1] about the subject of blog archiving in
the past. And I remember hearing Jon Udell and Dan Chudnov talk about
it [2]. Who knows what technorati, bloglines and googlereader are
doing in this area. I guess the reality is that blogs are on the web
and as such will be archived by InternetArchive [3]. But perhaps that
doesn't really fit quite right? That's my feeling.


Thanks. BlogML was new to me - sounds interesting but looks very shaggy
and over-engineered - you do not even get the spec in HTML but have to
download an archive that contains tons of nasty .NET files and an XML
schema instead of a textual description with examples and discussion. I
copied the XML schema here: http://www.gbv.de/wikis/cls/BlogML. I think
extending ATOM is the better way.


I think your general point is correct. Libraries need to be
integrating themselves into the web these days rather than expecting
the web to integrate into them.


I doubt that archiving weblogs is that complicated [1]! You need a
harvester (partly implemented in many Feed-Reader), an archive (you
could start with just saving validated ATOM-Files), an index (Solr?) and
a reader (also already implemented in many Feed-Readers). I bet you
don't need more then a medium size project with one or two developers
and one or two years to create sustainable tools for basic weblog
archiving. Such a project could be done by any larger library or archive
that is able to get funding. It's not a lack of resources, it's a lack
of visions.


Oh, and would it be alright to add your blog to
http://planet.code4lib.org -- we need more of an international
presence on there IMHO.


The subfeed http://jakoblog.de/category/en/feed/atom/ contains all
English language postings which are probably of higher relevance.

Jakob

[1] Ok, real long-term preservatation *is* complicated but if you only
archive well-formed XML that conforms to a given schema (ATOM, HTML) you
should be in a good position for the next decades.

--
Jakob Voß [EMAIL PROTECTED], skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-22 Thread Ed Summers
On 10/22/07, Jakob Voss [EMAIL PROTECTED] wrote:
 I doubt that archiving weblogs is that complicated [1]! You need a
 harvester (partly implemented in many Feed-Reader), an archive (you
 could start with just saving validated ATOM-Files), an index (Solr?) and
 a reader (also already implemented in many Feed-Readers). I bet you
 don't need more then a medium size project with one or two developers
 and one or two years to create sustainable tools for basic weblog
 archiving. Such a project could be done by any larger library or archive
 that is able to get funding. It's not a lack of resources, it's a lack
 of visions.

I completely agree.  When developing software it's really important to
focus on the cleanest/clearest solution, rather than getting bogged
down in edge cases and the comments from nay sayers. I hope that my
response didn't come across that way.

A couple follow on questions for you:

In your vision for this software are you expecting that content
providers would have to implement RFC 5005 for your archiving system
to work?

Are you considering archiving media files associated with a blog entry
(images, sound, video, etc?).

//Ed


Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-20 Thread Michael J. Giarlo
On 10/19/07, Ed Summers [EMAIL PROTECTED] wrote:


 Stuart Weibel has written [1] about the subject of blog archiving in
 the past. And I remember hearing Jon Udell and Dan Chudnov talk about
 it [2].


Dan also wrote about blog mirroring, which may be applicable, here:

http://onebiglibrary.net/story/simple-old-design-for-widespread-blog-mirroring

-Mike


Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-19 Thread Ed Summers
Thanks for posting this Jakob. I was just reading RFC 5005 on the
train yesterday (literally) and the parallels between it and OAI-PMH
struck me as well. It's not quite clear to me how deleted records
would be handled with an atom archive feed. But I guess one could
assume if the identifier is no longer present it has been deleted it.
But that would require pulling the entire archive... I'm not really
sure how much deletes are really used in OAI-PMH repositories anyhow.

Stuart Weibel has written [1] about the subject of blog archiving in
the past. And I remember hearing Jon Udell and Dan Chudnov talk about
it [2]. Who knows what technorati, bloglines and googlereader are
doing in this area. I guess the reality is that blogs are on the web
and as such will be archived by InternetArchive [3]. But perhaps that
doesn't really fit quite right? That's my feeling.

I think your general point is correct. Libraries need to be
integrating themselves into the web these days rather than expecting
the web to integrate into them.

Oh, and would it be alright to add your blog to
http://planet.code4lib.org -- we need more of an international
presence on there IMHO.

//Ed

[1] http://weibel-lines.typepad.com/weibelines/2007/08/blog-curation-e.html
[2] 
http://blog.jonudell.net/2007/02/16/a-conversation-with-dan-chudnov-about-openurl-context-sensitive-linking-and-digital-archiving/
[3] http://web.archive.org/web/*/http://jakoblog.de/