Hello,

I have one specific domain. I tested further and it looks like nutch? fetches 
this domain's other links but the ones with ?. Also nutch fetches other domains 
with ? symbol.

 
How to know if robots.txt on this domain blocks this specific links to be 
fetched?

Thanks.
A.


 

-----Original Message-----
From: Bartosz Gadzimski <[email protected]>
To: [email protected]
Sent: Sun, 1 Mar 2009 11:13 am
Subject: Re: urls with ? and & symbols









[email protected] pisze:?

>  Hello,?

>?

> I use nutch-0.9 and try to index urls with ? and & symbols. I have commented 
> this line? -[...@=] in conf/crawl-urlfilter.txt, conf/automaton-urlfilter and 
> conf/regex-urlfilter.txt files.?

> However nutch still ignores these urls.?

>?

> Does anyone know how this can be fixed??

>?

> Thanks in advance.?

> A.?

>?

>?

>  
>?

>?

>?

>?

>   
Hi,?
?

If you commented out those line it should be fine. That part is correct 
so problem is somewhere else.?
?

I must give us more information like:?

- does your nutch crawles and index "normal" URL's (without ? and &)?

- are you crawling domains that are NOT blocked in crawl-urlfilter?

- is robots.txt on this domain doesn't block your url's?

- are your talking about one specific domain or many different??
?

Thanks,?

Bartosz?



 

Reply via email to