FW: RSS-fecter and index individul-how can i realize this function

2007-02-08 Thread HUYLEBROECK Jeremy RD-ILAB-SSF

I send again this message as it apparently didn't go through.
(I am messing up with my email addresses on the mailing list...) 

-Original Message-
Sent: Friday, February 02, 2007 10:29 AM

Using Nutch 0.8, we modified the code starting at the fetching/parsing steps 
and the following.
We have a different implementation of the Parse Object and OutputFormat 
including an additional list of ParseData objects saved in an additionnal 
subfolder in the DFS.
We changed the indexing step a lot too, so we don't use the nutch code there.


-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Friday, February 02, 2007 10:19 AM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function

Attention, votre correspondant continue de vous écrire à votre ancienne adresse 
en @orange-ft.com, qui va être désactivée début avril. Veuillez lui demander de 
mettre à jour son carnet d'adresses avec votre nouvelle adresse en 
@orange-ftgroup.com.

Caution : your correspondent is still writing to your orange-ft.com address, 
which will be disabled beginning of April. Please ask him/her to update his/her 
address book to orange-ftgroup.com 
..

Gal Nitzan wrote:
 IMHO the data that is needed i.e. the data that will be fetched in the next 
 fetch process is already available in the item element. Each item element 
 represents one web resource. And there is no reason to go to the server and 
 re-fetch that resource.

Perhaps ProtocolOutput should change.  The method:

   Content getContent();

could be deprecated and replaced with:

   Content[] getContents();

This would require changes to the indexing pipeline.  I can't think of

any severe complications, but I haven't looked closely.

Could something like that work?

Doug



Re: FW: RSS-fecter and index individul-how can i realize this function

2007-02-08 Thread Renaud Richardet

HUYLEBROECK Jeremy RD-ILAB-SSF wrote:

I send again this message as it apparently didn't go through.
(I am messing up with my email addresses on the mailing list...) 


-Original Message-
Sent: Friday, February 02, 2007 10:29 AM

Using Nutch 0.8, we modified the code starting at the fetching/parsing steps 
and the following.
We have a different implementation of the Parse Object and OutputFormat 
including an additional list of ParseData objects saved in an additionnal 
subfolder in the DFS.
We changed the indexing step a lot too, so we don't use the nutch code there.
  
Is your implementation similar to what we started at 
https://issues.apache.org/jira/browse/NUTCH-443? If you think some of 
your changes could be integrated, please post a patch there.


Thanks for sharing,
Renaud


-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Friday, February 02, 2007 10:19 AM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function

Attention, votre correspondant continue de vous écrire à votre ancienne adresse 
en @orange-ft.com, qui va être désactivée début avril. Veuillez lui demander de 
mettre à jour son carnet d'adresses avec votre nouvelle adresse en 
@orange-ftgroup.com.

Caution : your correspondent is still writing to your orange-ft.com address, 
which will be disabled beginning of April. Please ask him/her to update his/her 
address book to orange-ftgroup.com 
..

Gal Nitzan wrote:
  

IMHO the data that is needed i.e. the data that will be fetched in the next fetch process 
is already available in the item element. Each item element represents one 
web resource. And there is no reason to go to the server and re-fetch that resource.



Perhaps ProtocolOutput should change.  The method:

   Content getContent();

could be deprecated and replaced with:

   Content[] getContents();

This would require changes to the indexing pipeline.  I can't think of

any severe complications, but I haven't looked closely.

Could something like that work?

Doug


  



--
Renaud Richardet  +1 617 230 9112
my email is my first name at apache.org  http://www.oslutions.com