On 3/3/06, Michael Ji <[EMAIL PROTECTED]> wrote:
> hi,
>
> I tried this, actually in my case, one site ends with
> .net and the other is .org
>
> so I modified it to
>
> +^http://([a-z0-9]*\.)*(abc.net|def.org)/
I guess '.' is metadata in regexp, so pls try
+^http://([a-z0-9]*\.)*(abc\.net|def\.org)/

Good luck!

> and I run another testing, seems doesn't work, coz I
> saw a site other than abc and def is being fetched,
>
> any hints?
>
> thanks,
>
> Michael,
>
> --- sudhendra seshachala <[EMAIL PROTECTED]> wrote:
>
> >
> > Hi,
> >   Try the following pattern
> >   +^http://([a-z0-9]*\.)*(abc|def).com/
> >
> >   I was able to search couple of sites using similar
> > pattern.
> >   If this is what you are asking ?
> >
> > Michael Ji <[EMAIL PROTECTED]> wrote:
> >   Hi,
> >
> > I searched on the mail-post, but still have problem
> > to
> > run my testing.
> >
> > Actually, I want my crawling is limited to two site
> > solely.
> >
> > such as, *.abc.com/*
> > and *.def.com/*
> >
> > so I put two line in crawl-urlfilter.txt as
> > +^http://([a-z0-9]*\.)*.abc.com/
> > +^http://([a-z0-9]*\.)*.def.com/
> >
> > But after running testing, the crawling is not
> > limited
> > to the above two sites.
> >
> > From log, I found "not found ...urlfilter-prefix"
> >
> > I wonder if the failure is due to not include
> > crawl-urlfilter.txt in my configure xml or there is
> > syntax error for my previous statement.
> >
> > thanks,
> >
> > Michael
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam
> > protection around
> > http://mail.yahoo.com
> >
> >
> >
> >   Sudhi Seshachala
> >   http://sudhilogs.blogspot.com/
> >
> >
> >
> >
> > ---------------------------------
> > Yahoo! Mail
> > Bring photos to life! New PhotoMail  makes sharing a
> > breeze.
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to