parsing and using xml-data

Karsten Dello Thu, 08 Jun 2006 12:38:01 -0700

Dear list,

I would like to process metadata from publication repositories into anutch index.

The metadata comes as xml (OAI_PMH to be more precise).


The starting URLs look like

http://oai_host/servlet?method=getRecords&set=someSet

Theses requests return lists,
which basically look like

<list>
<item>
        <id>32423</id>
        <content>very long desciption1, e.g. an abstract</content>
        <url>http://somewhere.com/somedoc1.pdf</url>
</item>

<item>
        <id>12441</id>
        <content>very long desciption2, e.g. an abstract</content>
        <url>http://somewhereelse.it/somedoc2.pdf</url>
</item>

</list>

My initial idea was to utilize the Parser-Extension-Point
and provide a plugin which works the same way the rss-parser does:
return all outlinks to the detailed view forms

- e.g.http://oai_host/servlet?method=getSingleRecord&id=_value_of_id-element_ -

and skip the content of the list.

Following these links would return documents with one item only.

Is it possible to store these documents with the url from the<url>-element instead of the "real" url (i.e. the servlet-uri used forthe request)?


Would this work out? Can you suggest a better approach?

Anyway, refetching all single hits is pretty much a waste,
as all information is already included in the list.
Any comments on that?


Help would be very much appreaciated,

Best regards

Karsten

parsing and using xml-data

Reply via email to