[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: parse-map-core-untested.patch
allow parsers to return multiple Parse object, this will
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471260
]
Dogacan Güney commented on NUTCH-443:
-
Ok, this is the second attempt(sorry that I am sending patches in a
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: parse-map-core-draft-v1.patch
allow parsers to return multiple Parse object, this will
2. It sounds like a pretty fundamental API shift in Nutch, to support a
single type of content, RSS. Even if there are more content types that
follow this model, as Doug and Renaud both pointed out, there aren't a
multitude of them (perhaps archive files, but can you think of any
others)?
Hi Doug,
Okay, I see your points. It seems like this would be really useful for
some current folks, and for Nutch going forward. I see that there has been
some initial work today and preparing patches. I'd be happy to shepherd this
into the sources. I will begin reviewing what's required, and
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Renaud Richardet updated NUTCH-443:
---
Attachment: parsers.diff
Great, here's my work-in-progress(not finished, not tested) for
I send again this message as it apparently didn't go through.
(I am messing up with my email addresses on the mailing list...)
-Original Message-
Sent: Friday, February 02, 2007 10:29 AM
Using Nutch 0.8, we modified the code starting at the fetching/parsing steps
and the following.
We
HUYLEBROECK Jeremy RD-ILAB-SSF wrote:
I send again this message as it apparently didn't go through.
(I am messing up with my email addresses on the mailing list...)
-Original Message-
Sent: Friday, February 02, 2007 10:29 AM
Using Nutch 0.8, we modified the code starting at the
So, here is what I do for RSS Feeds.
I parse the rss, and for each outlink, I create the outlink object and set
inside the anchor text for each outlink a well formed xml string. It
contains the pub date, description, etc. Now, this is only because I was
hacking the outlink to just use it's