Re: Targeting Specific Links for Crawling

2009-10-05 Thread Andrzej Bialecki

Eric wrote:
Does anyone know if it possible to target only certain links for 
crawling dynamically during a crawl? My goal would be to write a plugin 
for this functionality but I don't know where to start.


URLFilter plugins may be what you want.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



RE: Targeting Specific Links for Crawling

2009-10-05 Thread BELLINI ADAM



how to target certain links !! do you know how the links are made !? i mean 
their format ?
you can just set a regular expression to accept only those kind of links 



 Date: Mon, 5 Oct 2009 21:39:52 +0200
 From: a...@getopt.org
 To: nutch-user@lucene.apache.org
 Subject: Re: Targeting Specific Links for Crawling
 
 Eric wrote:
  Does anyone know if it possible to target only certain links for 
  crawling dynamically during a crawl? My goal would be to write a plugin 
  for this functionality but I don't know where to start.
 
 URLFilter plugins may be what you want.
 
 
 -- 
 Best regards,
 Andrzej Bialecki 
   ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com
 
  
_
New: Messenger sign-in on the MSN homepage
http://go.microsoft.com/?linkid=9677403

Re: Targeting Specific Links for Crawling

2009-10-05 Thread Eric

Adam,

Yes, I have a list of strings I would look for in the link. My plan is  
to look for X number of links on the site - First looking for the  
links I want and if they exist, add them, if they don't  exist add X  
links from the site. I am planning to start in the URL Filter plugin.


Eric

On Oct 5, 2009, at 12:58 PM, BELLINI ADAM wrote:





how to target certain links !! do you know how the links are made !?  
i mean their format ?
you can just set a regular expression to accept only those kind of  
links





Date: Mon, 5 Oct 2009 21:39:52 +0200
From: a...@getopt.org
To: nutch-user@lucene.apache.org
Subject: Re: Targeting Specific Links for Crawling

Eric wrote:

Does anyone know if it possible to target only certain links for
crawling dynamically during a crawl? My goal would be to write a  
plugin

for this functionality but I don't know where to start.


URLFilter plugins may be what you want.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



_
New: Messenger sign-in on the MSN homepage
http://go.microsoft.com/?linkid=9677403




RE: Targeting Specific Links for Crawling

2009-10-05 Thread BELLINI ADAM

but when  you will start by inject your starting point from your seed...after 
that nutch will fetch urls and it will bypass those filtred by urlfilter 
(regular expression)...so to calculate the number X of those URLS you have to 
crawl all your site !!
so for sure if you will not have any regular expression you will have all the 
links oif your site (with the X needed links), but i guess you wont do that 
becoz it's a waste of time.
i can see just one solutuion is to well set the urlfilter.txt (with the right 
regular expression).
anybody hv other ideas ??







 Subject: Re: Targeting Specific Links for Crawling
 From: e...@lakemeadonline.com
 Date: Mon, 5 Oct 2009 13:07:25 -0700
 To: nutch-user@lucene.apache.org
 
 Adam,
 
 Yes, I have a list of strings I would look for in the link. My plan is  
 to look for X number of links on the site - First looking for the  
 links I want and if they exist, add them, if they don't  exist add X  
 links from the site. I am planning to start in the URL Filter plugin.
 
 Eric
 
 On Oct 5, 2009, at 12:58 PM, BELLINI ADAM wrote:
 
 
 
 
  how to target certain links !! do you know how the links are made !?  
  i mean their format ?
  you can just set a regular expression to accept only those kind of  
  links
 
 
 
  Date: Mon, 5 Oct 2009 21:39:52 +0200
  From: a...@getopt.org
  To: nutch-user@lucene.apache.org
  Subject: Re: Targeting Specific Links for Crawling
 
  Eric wrote:
  Does anyone know if it possible to target only certain links for
  crawling dynamically during a crawl? My goal would be to write a  
  plugin
  for this functionality but I don't know where to start.
 
  URLFilter plugins may be what you want.
 
 
  -- 
  Best regards,
  Andrzej Bialecki 
   ___. ___ ___ ___ _ _   __
  [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
  ___|||__||  \|  ||  |  Embedded Unix, System Integration
  http://www.sigram.com  Contact: info at sigram dot com
 
  
  _
  New: Messenger sign-in on the MSN homepage
  http://go.microsoft.com/?linkid=9677403
 
  
_
New! Open Messenger faster on the MSN homepage
http://go.microsoft.com/?linkid=9677405