Hi
Is there any standardized way that nutch is getting a semantic version
of a web-page, e.g. the HTML page is as follows
<html>
<head>
<link rel="semantic-content" href="index-semantic.xml"/>
</head>
<body>
blablabal ..
</body>
</html>
and the sematic XML (index-semantic.xml) would be something more useful
than the HTML itself
<?xml version="1.0"?>
<semantic-of href="index.html">
...
</semantic-of>
resp. some RDF or whatever.
Any pointers are very welcome.
Thanks
Michi
--
Michael Wechner
Wyona - Open Source Content Management - Apache Lenya
http://www.wyona.com http://lenya.apache.org
[EMAIL PROTECTED] [EMAIL PROTECTED]
+41 44 272 91 61