Re: [Nutch-general] Re: RDF plugin questions

Stefan Groschupf Thu, 21 Jul 2005 08:55:14 -0700

Hi Erik,

Stefan - thanks for the reply. I'm still digesting Nutch and howto work with it at a basic level but it does make sense to allowmetadata to tag along with fetches - I certainly don't know enoughyet to say whether your patch fits into the long-term vision ofNutch or not yet.

Well I would be interested to hear about the long-term vision ofNutch as well. :)

I've started writing a custom RDF parser plugin that will take theURL and simply add it to Kowari (letting Kowari actually parse itand ingest it). But I'm feeling like this might not be the bestapproach.
At what stage would make the most sense for ingesting RDF into anexternal system? Is parsing the most logical stage?
Further on this topic, I'm curious about indexing multiple"documents" per .rdf file fetched - for instance, one document perRDF "resource".

You hit another problem I see since some time. For example I see thisproblem for rss parsing, image search or any other them where youhave multiple logical documents per physical document (xml feed, htmlpage).

Is this currently possible with a plugin?

NO! As far I know you can only have one document per one URL.

If not, what would it take to do something like this? Maybe thisapproach doesn't even make sense in the Nutch sense - I'm justexploring my architectural options.

I solved a similar problem with following steps.
I fetch but I do not parse until fetch time.

In the next step I read the unparsed content from the segment use aown parser and directly indexed the content I had parsed. Beside thisI had written a text file with extracted URLs. This urls was mergedback to webdb in the end.It was working but not more than a prototype and at least I wasasking myself if it makes sense to use nutch for such a task.Anyway I would be very happy to see a patch that allows to extractmultiple documents from one source ( this would help to implement abetter rss or image search) however I think that is a very tricky issue.

HTH

Stefan

Re: [Nutch-general] Re: RDF plugin questions

Reply via email to