Nutch + RDF for scholarly archives

Erik Hatcher Wed, 29 Jun 2005 13:45:49 -0700

Is anyone here using Nutch for crawling digital scholarly archives?If so, are you also harvesting and indexing additional metadata?

My group (http://www.patacriticism.org) is considering using Nutch tocrawl a specific set of sites and index the HTML as full-text andalso retrieve any associated RDF data (specified with a hyperlink ina <meta> tag perhaps, like this page: http://www.rossettiarchive.org/docs/1-1847.s244.raw.html). The RDF most likely could be simplyindexed as additional fields, but perhaps it would also be added toan RDF engine (such as Kowari) and perhaps additionally queried inthe search interface in conjunction with full-text searching.

The Ontology and Creative Commons plugins are great starting places,for sure. I'm wondering what others have done along these lines.


Thanks,
    Erik

Nutch + RDF for scholarly archives

Reply via email to