Hi I receive a few errors while crawling sites. It ususally happens when it attempts to retrieve PDF or other documents.
fetching http://www.un.org/esa/sustdev/publications/sdea/2_urbaine/pdf/03_partie_1.pdf fetching http://www.bilfingerberger.com/C125710E004ABFC5/Print/W26MNKY3155MARSEN fetching http://www.un.org/esa/sustdev/publications/sdea/1_villageoise/pdf/03_partie_1.pdf fetching http://www.sourcesecurity.com/companies/micro-site/verint-systems/case-studies.html fetching http://www.britishland.com/images/Biodiversity Programme.pdf *fetch of http://www.britishland.com/images/Biodiversity Programme.pdf failed with: java.lang.Ille galArgumentException: Invalid uri ' http://www.britishland.com/images/Biodiversity Programme.pdf': escaped absolute path not valid* fetching http://www.sourcesecurity.com/technical-details/cctv/image-capture/lenses/fujinon-fe185c o57ha-1.html fetching http://www.sourcesecurity.com/product-filter/cctv/enclosures-and-fittings/consoles-racks -and-desks.html As I can see Nutch deosn't properly convert links, it doesn't URL escape them for some reasone. Could someone advise me if there is a patch or something to help me identify the place where it happens. -- Best Regards Alexander Aristov
