Hi, 

You need to remove the '?' and the '=' from the following pattern:
[EMAIL PROTECTED]

Regards,
Sebastien.


--- mu xiaofeng <[EMAIL PROTECTED]> a écrit :

> hi ,
> 
> I'm use Nutch 0.7 crawler to fetch my site ,
> but it only fetch the static html files like :
> xxx.htm , xxx.html , xxx.asp ,  xxx.php , xxx.js
> 
> How can I use it to fetch the dynamic news
> ex: http://mysite.com/news.asp?id=12345  .?
> my crawl-urlfilter.txt content is
> -----------------------------------------
> # The url filter file used by the crawl command.
> 
> # Better for intranet crawling.
> # Be sure to change MY.DOMAIN.NAME to your domain name.
> 
> # Each non-comment, non-blank line contains a regular expression
> # prefixed by '+' or '-'.  The first matching pattern in the file
> # determines whether a URL is included or ignored.  If no pattern
> # matches, the URL is ignored.
> 
> # skip file:, ftp:, & mailto: urls
> -^(file|ftp|mailto):
> 
> # skip image and other suffixes we can't yet parse
>
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png|PNG)$
> 
> # skip URLs containing certain characters as probable queries, etc.
> [EMAIL PROTECTED]
> 
> # accept hosts in MY.DOMAIN.NAME
> +^http://mysite.com/
> 
> # skip everything else
> -.
> -----------------------------------------
> 
> Thx all,
> 



        

        
                
___________________________________________________________________________ 
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger 
Téléchargez cette version sur http://fr.messenger.yahoo.com


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to