Hello all,

I have launched a crawling process for 100 websites with external links equals 
to true.
After several hours, I run the crawlcomplete command with mode equals host.

The crawlcomplete output file contains(apart from the proper host names) the 
following lines.

1    #Are there any places to eat onsite during the show#Are there any places 
to eat onsite during the show UNFETCHED
1    #Are there any points where I can access the internet at the show#Are 
there any points where I can access the internet at the show UNFETCHED
1    #Can I register onsite#Can I register onsite UNFETCHED
1    #Can children attend the show#Can children attend the show UNFETCHED
1    #Can you recommend any site-seeing attractions in Amsterdam#Can you 
recommend any site-seeing attractions in Amsterdam UNFETCHED
1    #Do I need a visa#Do I need a visa UNFETCHED
1    #How do I get to IBC2018 at the Amsterdam RAI#How do I get to IBC2018 at 
the Amsterdam RAI UNFETCHED
1    #Is there anywhere for me to practice my religion#Is there anywhere for me 
to practice my religion UNFETCHED
1    #Is there parking#Is there parking UNFETCHED
1    #Want to exhibit at IBC2018#Want to exhibit at IBC2018 UNFETCHED
1    #What do I have access to at IBC#What do I have access to at IBC UNFETCHED
1    #What do I need to bring to IBC#What do I need to bring to IBC UNFETCHED
1    #What is the IBC Big Screen Experience#What is the IBC Big Screen 
Experience UNFETCHED
1    #When and where is IBC#When and where is IBC UNFETCHED
1    #Who attends IBC#Who attends IBC UNFETCHED

After googling I found the webpage where it came from:
https://show.ibc.org/about-ibc/faqs

It seems like Nutch takes the anchor name as an URL for the crawling a store it 
in database with the key equals to name. 

For example.
<a class="anchor" name="Are there any places to eat onsite during the 
show?"></a>

Any suggestion what is it and how to fix it?
Thanks.

Semyon.

Reply via email to