We would like to parse the content of a FOAF (http://www.foaf-project.org) file (a RDF file). We get the URI of the file from HTML, in the <head>'s <link> elements defined as :
<link rel="meta" type="application/rdf+xml" title="FOAF" href="URI/to/foaf.rdf" /> Does Nutch automatically schedules for fetching the href attribute value? If not, what could we do to fetch and parse it? Here's our guess solution for now : a) Create a HTMLParseFilter plugin that parses the HTML document to find any <link> element and add the href attribute value to the list of documents to be fetched. b) Create a parse plugin that is associated with the "application/rdf+xml" content-type What do you guys think? ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
