Hello everybody,

I use nutch-1.2 and I use it to crawl my website.

I see some url in fetched list that doesn’t exist in website.

 

Examples:

http://www.mydomain.com/LAJME_GOSSIP_CATEGORY/1105130159/Article-Gisele-B├Æ╞├ 
Γ¼├ÆΓ¼á├óΓ¼Γ‑ó├Æ╞├óΓ¼┬á├Æ┬ó├óΓ¼a┬¼├óΓ¼~┬ó├Æ╞├ 
Γ¼"├Æ┬ó├óΓ¼a┬¼├&┬í├Æ╞├óΓ¼┼í├ÆΓ¼a├┬╝ndchen-eshte-ende-modelja-me-e-paguar-.aspx"

http://www.mydomain.com/LIGJE/601270007/11.1.1.e

 

I think nutch is not parsing correctly in this case.

 

Thanks in advance.

Best Regards,

Marseld

 

 


<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni <b>Pun&euml; 
t&euml; Mir&euml;</b> dhe <b>t&euml; Mir&euml; p&euml;r Pun&euml;</b>... 
Vizitoni: <a target="_blank" 
href="http://www.punaime.al/";>www.punaime.al</a></span></p>
<p><a target="_blank" href="http://www.punaime.al/";><span 
style="text-decoration: none;"><img width="165" height="31" border="0" 
alt="punaime" src="http://www.ikub.al/images/punaime.al_small.png"; 
/></span></a></p>

Reply via email to