You can have the inclusion and exclusion urls regex specified in
different lines or combine them by ORing. That does not make much
difference. Make sure that you have this line at the end.

-.

This will make sure all other sites are not crawled.

- Ravi

On 3/3/06, Jack Tang <[EMAIL PROTECTED]> wrote:
> On 3/3/06, Michael Ji <[EMAIL PROTECTED]> wrote:
> > hi,
> >
> > I tried this, actually in my case, one site ends with
> > .net and the other is .org
> >
> > so I modified it to
> >
> > +^http://([a-z0-9]*\.)*(abc.net|def.org)/
> I guess '.' is metadata in regexp, so pls try
> +^http://([a-z0-9]*\.)*(abc\.net|def\.org)/
>
> Good luck!
>
> > and I run another testing, seems doesn't work, coz I
> > saw a site other than abc and def is being fetched,
> >
> > any hints?
> >
> > thanks,
> >
> > Michael,
> >
> > --- sudhendra seshachala <[EMAIL PROTECTED]> wrote:
> >
> > >
> > > Hi,
> > >   Try the following pattern
> > >   +^http://([a-z0-9]*\.)*(abc|def).com/
> > >
> > >   I was able to search couple of sites using similar
> > > pattern.
> > >   If this is what you are asking ?
> > >
> > > Michael Ji <[EMAIL PROTECTED]> wrote:
> > >   Hi,
> > >
> > > I searched on the mail-post, but still have problem
> > > to
> > > run my testing.
> > >
> > > Actually, I want my crawling is limited to two site
> > > solely.
> > >
> > > such as, *.abc.com/*
> > > and *.def.com/*
> > >
> > > so I put two line in crawl-urlfilter.txt as
> > > +^http://([a-z0-9]*\.)*.abc.com/
> > > +^http://([a-z0-9]*\.)*.def.com/
> > >
> > > But after running testing, the crawling is not
> > > limited
> > > to the above two sites.
> > >
> > > From log, I found "not found ...urlfilter-prefix"
> > >
> > > I wonder if the failure is due to not include
> > > crawl-urlfilter.txt in my configure xml or there is
> > > syntax error for my previous statement.
> > >
> > > thanks,
> > >
> > > Michael
> > >
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam? Yahoo! Mail has the best spam
> > > protection around
> > > http://mail.yahoo.com
> > >
> > >
> > >
> > >   Sudhi Seshachala
> > >   http://sudhilogs.blogspot.com/
> > >
> > >
> > >
> > >
> > > ---------------------------------
> > > Yahoo! Mail
> > > Bring photos to life! New PhotoMail  makes sharing a
> > > breeze.
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> >
>
>
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to