MY bad I thought that the FeedParser refers to nutch 1.X plugin found in the trunk of svn :-) I'm inquiring into this because I'm trying to parse a bunch of PDF files, but store each page individually, though 1 to N relation between 1 url and a lot of documents into solr, any advice on this subject? Right now I'm thinking in something like http://sujitpal.blogspot.com/2012/02/nutchgora-indexing-sections-and.html.
What do you think? ----- Mensaje original ----- De: "Lewis John Mcgibbney" <[email protected]> Para: [email protected] Enviados: Sábado, 2 de Marzo 2013 23:16:23 Asunto: Re: Problem compiling FeedParser plugin with Nutch 2.1 source Hi Jorge, Afaik it isn't. We're talking about w.x here On Saturday, March 2, 2013, Jorge Luis Betancourt Gonzalez < [email protected]> wrote: > How does the subdocuments get indexed into solr? I've thought that the 1 to N wasn't possible with nutch 1.X. > > ----- Mensaje original ----- > De: "Julien Nioche" <[email protected]> > Para: [email protected] > Enviados: Sábado, 2 de Marzo 2013 3:27:35 > Asunto: Re: Problem compiling FeedParser plugin with Nutch 2.1 source > > IIRC the FeedParser creates sub documents from the main feed document > parsed (1 to N) whereas Tika just treats them as new links and does the > fetch + parse in subsequent step. > > It is because Nutch 2.x does not support 1-to-N parse outputs that this > plugin hasn't been ported. I don't remember the exact history of this > plugin as it was in the code long before I got involved but it would be > good to get to the bottom of how it differs from parsing feeds with Tika > and decide whether it still makes sense to have it or not. > > J. > > > > On 1 March 2013 04:51, Anand Bhagwat <[email protected]> wrote: > >> Thanks for quick reply. >> >> Actually I needed some plugin for ATOM feed parsing so while searching in >> the source I found FeedParser but it was giving compilation errors. Later I >> tried Tika parser and was able to parse ATOM feed. I am not sure if I am >> missing something. Basically the tika parser extracted urls and created new >> entries in the database and later when I ran fetch job again I was able to >> fetch those urls. >> >> So the question is does FeedParser provides some additional functionality >> which is missing in Tika parser? As far as I know Tika parser uses ROME >> which is well known library for parsing feeds. >> >> Regards, >> Anand. >> >> On 1 March 2013 03:38, kiran chitturi <[email protected]> wrote: >> >> > Lewis, >> > >> > On the same note, the following plugins needs to be ported when i tried >> to >> > build 2.x with Eclipse >> > >> > i) Feed >> > ii) parse-swf >> > iii) parse-ext >> > iv) parse-zip >> > v) parse-metatags ( I wrote patch for this earlier, NUTCH-1478) >> > >> > The above plugins need to be ported to build 2.x successfully with >> plugins. >> > >> > >> > >> > On Thu, Feb 28, 2013 at 4:58 PM, Lewis John Mcgibbney < >> > [email protected]> wrote: >> > >> > > honestly, I think we should get this fixed. >> > > Can someone please explain to me why we don't build every plugin within >> > > Nutch 2.x? >> > > I think we should. >> > > >> > > >> > > On Thu, Feb 28, 2013 at 12:58 PM, kiran chitturi >> > > <[email protected]>wrote: >> > > >> > > > This is a problem with the feed plugin. It is not yet ported to 2.x. >> > > > >> > > > The FeedIndexingFilter Class extends the IndexingFilter whose >> interface >> > > and >> > > > method changed from 1.x to 2.x >> > > > >> > > > I fixed a similar one in Parse-metaTags which extends the ParseFilter >> > > > interface. >> > > > >> > > > [Nutch-874] was opened related to these issues but we do not know >> still >> > > > what plugins need to be ported due to the API changes. >> > > > >> > > > >> > > > >> > > >> > >> https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> > > > >> > > > >> > > > >> > > > On Thu, Feb 28, 2013 at 3:26 PM, Lewis John Mcgibbney < >> > > > [email protected]> wrote: >> > > > >> > > > > This shouldn't be happening but we are aware (the Jira instance >> > > reflects >> > > > > this) that there are some existing compatibility issues with Nutch >> > 2.x >> > > > > HEAD. >> > > > > IIRC Kiran had a patch integrated which dealt with some of these >> > > issues. >> > > > > What I have to ask is what JDK are you using? I use 1.6.0_25 (I >> > really >> > > > need >> > > > > to upgrade) on my laptop -- *Lewis*

