HUYLEBROECK Jeremy RD-ILAB-SSF wrote: > I send again this message as it apparently didn't go through. > (I am messing up with my email addresses on the mailing list...) > > -----Original Message----- > Sent: Friday, February 02, 2007 10:29 AM > > Using Nutch 0.8, we modified the code starting at the fetching/parsing steps > and the following. > We have a different implementation of the Parse Object and OutputFormat > including an additional list of ParseData objects saved in an additionnal > subfolder in the DFS. > We changed the indexing step a lot too, so we don't use the nutch code there. > Is your implementation similar to what we started at https://issues.apache.org/jira/browse/NUTCH-443? If you think some of your changes could be integrated, please post a patch there.
Thanks for sharing, Renaud > > -----Original Message----- > From: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: Friday, February 02, 2007 10:19 AM > To: nutch-dev@lucene.apache.org > Subject: Re: RSS-fecter and index individul-how can i realize this function > > Attention, votre correspondant continue de vous écrire à votre ancienne > adresse en @orange-ft.com, qui va être désactivée début avril. Veuillez lui > demander de mettre à jour son carnet d'adresses avec votre nouvelle adresse > en @orange-ftgroup.com. > > Caution : your correspondent is still writing to your orange-ft.com address, > which will be disabled beginning of April. Please ask him/her to update > his/her address book to orange-ftgroup.com > .................................................. > > Gal Nitzan wrote: > >> IMHO the data that is needed i.e. the data that will be fetched in the next >> fetch process is already available in the <item> element. Each <item> >> element represents one web resource. And there is no reason to go to the >> server and re-fetch that resource. >> > > Perhaps ProtocolOutput should change. The method: > > Content getContent(); > > could be deprecated and replaced with: > > Content[] getContents(); > > This would require changes to the indexing pipeline. I can't think of > > any severe complications, but I haven't looked closely. > > Could something like that work? > > Doug > > > -- Renaud Richardet +1 617 230 9112 my email is my first name at apache.org http://www.oslutions.com ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers