I encountered that error as well. I believe it's happening because the
javascript parser is trying to pull valid urls out of javascript, which is
highly optimistic considering that such urls may be getting pieced together
using string appends. I would override the 'plugin.includes' config value in
nutch-default.xml (by placing it in nutch-site.xml) and turn off javascript
parsing.

On Thu, Oct 9, 2008 at 8:13 AM, Höchstötter Nadine <
[EMAIL PROTECTED]> wrote:

> Hi all,
> I have a problem with javascript. I tried to crawl bild.de and I got many
> links not having been fetched. I got the stats and they mostly say "Status
> 3:  (db_gone)". With a look at those urls entitled "db_gone" you will see
> some weird things as listed below the email. I just listed a few. I do not
> think that this is only a javascript problem but probably also  a url
> normalization problem. Does anybody know how to deal with it? Thanks,
> Nadine.
>
> http://software.bild.de/js/6M/x-6N-6Q-6T
>
> Status: 3 (db_gone)
>
> http://software.bild.de/js/;l(6.1f(7,
>
> Status: 3 (db_gone)
>
> http://software.bild.de/js/</22>
>
> Status: 3 (db_gone)
>
> http://software.bild.de/js/</4t></29></22>
>
> Status: 3 (db_gone)
>
> http://software.bild.de/js/a.1i
>
> Status: 3 (db_gone)
>
> http://software.bild.de/js/},4o:q(){6(7)[6(7).4E(
>
> Status: 3 (db_gone)
>
> http://software.bild.de/ratgeber-karriere/jobs/allgemein
>
> Status: 3 (db_gone)
>
> http://software.bild.de/text/javascript
>
> Status: 3 (db_gone)
>
> http://software.bild.de/top.document.all.
>
> Status: 3 (db_gone)
>
> http://tv.bild.de/+escape(document.referrer)+
>
> Status: 3 (db_gone)
>
> http://tv.bild.de/_js/+escape(document.referrer)+
>
> Status: 3 (db_gone)
>
> http://tv.bild.de/_js/...
>
> Status: 3 (db_gone)
>
> http://tv.bild.de/_js/1.5.1.1
>
> Status: 3 (db_gone)
>
> http://tv.bild.de/_js/</tbody></+escape(document.referrer)+
>
> Status: 3 (db_gone)
>
> http://tv.bild.de/_js/</tbody></bild//CP//+escape(document.referrer)+
>
> Status: 3 (db_gone)
>
> http://tv.bild.de/_js/
> </tbody></bild//CP//bild//CP//+escape(document.referrer)+
>
> Status: 3 (db_gone)
>
> http://tv.bild.de/_js/
> </tbody></bild//CP//bild//CP//entertainment/body/tv/tvprogramm/tvprogramm/home/+escape(document.referrer)+
>
> Status: 3 (db_gone)
>
>
>

Reply via email to