First of all Thank You Richard and Mark.
I am able to move forward.
Now, I have to make sure, I dont parse unnecesary URLs in a given page.
Typically sites are organized such that there is a common look and feel
looping back to home and things like that..
I want to just ignore some URLs which is not relevant to my crawl and only
crawl those with specific pattern.
Can I use the whitelist urlfilter for this purpose.. Can some one help me
understand how it works.. I know how a plug in works. But I need to know,
how it actually works..

Thanks



On 3/9/06, Vertical Search <[EMAIL PROTECTED]> wrote:
>
> Okay, I have noticed that for URLs containing "?", "&" and "=" I cannot
> crawl.
> I have tried all combinations of modifying crawl-urlfilter.txt and
> # skip URLs containing certain characters as probable queries, etc.
> [EMAIL PROTECTED]
>
> But invain. I have hit a road block.. that is terrible.. :(
>
>
>

Reply via email to