hi,

I tried this, actually in my case, one site ends with
.net and the other is .org

so I modified it to 

+^http://([a-z0-9]*\.)*(abc.net|def.org)/

and I run another testing, seems doesn't work, coz I
saw a site other than abc and def is being fetched,

any hints?

thanks,

Michael,

--- sudhendra seshachala <[EMAIL PROTECTED]> wrote:

> 
> Hi,
>   Try the following pattern
>   +^http://([a-z0-9]*\.)*(abc|def).com/
>    
>   I was able to search couple of sites using similar
> pattern.
>   If this is what you are asking ?
>   
> Michael Ji <[EMAIL PROTECTED]> wrote:
>   Hi,
> 
> I searched on the mail-post, but still have problem
> to
> run my testing.
> 
> Actually, I want my crawling is limited to two site
> solely.
> 
> such as, *.abc.com/*
> and *.def.com/*
> 
> so I put two line in crawl-urlfilter.txt as
> +^http://([a-z0-9]*\.)*.abc.com/
> +^http://([a-z0-9]*\.)*.def.com/
> 
> But after running testing, the crawling is not
> limited
> to the above two sites. 
> 
> From log, I found "not found ...urlfilter-prefix"
> 
> I wonder if the failure is due to not include
> crawl-urlfilter.txt in my configure xml or there is
> syntax error for my previous statement.
> 
> thanks,
> 
> Michael
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> 
> 
>   Sudhi Seshachala
>   http://sudhilogs.blogspot.com/
>    
> 
> 
>               
> ---------------------------------
> Yahoo! Mail
> Bring photos to life! New PhotoMail  makes sharing a
> breeze. 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to