Hi jack,

.. the way  mentioned is one way to sort out the problem
but should we check for the URL against any regularexpression during crawling and is it possible ?
or while indexing. ?

Any helps is appreciated
Thanks in advance
regards
----- Original Message ----- From: "Jack Tang" <[EMAIL PROTECTED]> To: <[email protected]>; "K.A.Hussain Ali" <[EMAIL PROTECTED]>
Sent: Thursday, December 08, 2005 8:05 PM
Subject: Re: Crawling listing (pagination) pages.


Hi

I am facing the same problem. However my crawl only focuses on some
website and I recognize the paganition url ursing regexp and inject
them in every fetch cycle.

/Jack

On 12/8/05, K.A.Hussain Ali <[EMAIL PROTECTED]> wrote:
HI all,

Do Nutch crawl pages in any listing pages( pages with pagination as in search engines)

While crawling through nutch i need to get the pages that gets displayed by the pagination unless i increase the depth of the whole crawling.
    Do nutch provide any plugin for the above issue ?
    Is there anyway to solve the above issue ?

Any help is greatly appreciated
Thanks in advance
regards
-Hussain



--
Keep Discovering ... ...
http://www.jroller.com/page/jmars



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to