I don't see how that's possible unless you improve the javascript parser. I actually think it's pretty much impossible to get links properly from javascript unless the script is actually interpreted and executed, which is a much different task than what the parser plugin does.
On Fri, Oct 10, 2008 at 1:17 AM, Höchstötter Nadine < [EMAIL PROTECTED]> wrote: > Thank you for your answer. But, when I override the Javascript plugin, > there will be links missing. Is there any possibility to get those > javascript urls? > Thanks. > > -----Ursprüngliche Nachricht----- > Von: Kevin MacDonald [mailto:[EMAIL PROTECTED] > Gesendet: Donnerstag, 9. Oktober 2008 19:26 > An: [email protected] > Betreff: Re: db_gone/javascript/invalid URLs > > I encountered that error as well. I believe it's happening because the > javascript parser is trying to pull valid urls out of javascript, which is > highly optimistic considering that such urls may be getting pieced together > using string appends. I would override the 'plugin.includes' config value > in > nutch-default.xml (by placing it in nutch-site.xml) and turn off javascript > parsing. > > On Thu, Oct 9, 2008 at 8:13 AM, Höchstötter Nadine < > [EMAIL PROTECTED]> wrote: > > > Hi all, > > I have a problem with javascript. I tried to crawl bild.de and I got > many > > links not having been fetched. I got the stats and they mostly say > "Status > > 3: (db_gone)". With a look at those urls entitled "db_gone" you will see > > some weird things as listed below the email. I just listed a few. I do > not > > think that this is only a javascript problem but probably also a url > > normalization problem. Does anybody know how to deal with it? Thanks, > > Nadine. > > > > http://software.bild.de/js/6M/x-6N-6Q-6T > > > > Status: 3 (db_gone) > > > > http://software.bild.de/js/;l(6.1f(7, > > > > Status: 3 (db_gone) > > > > http://software.bild.de/js/</22> > > > > Status: 3 (db_gone) > > > > http://software.bild.de/js/</4t></29></22> > > > > Status: 3 (db_gone) > > > > http://software.bild.de/js/a.1i > > > > Status: 3 (db_gone) > > > > http://software.bild.de/js/},4o:q(){6(7)[6(7).4E( > > > > Status: 3 (db_gone) > > > > http://software.bild.de/ratgeber-karriere/jobs/allgemein > > > > Status: 3 (db_gone) > > > > http://software.bild.de/text/javascript > > > > Status: 3 (db_gone) > > > > http://software.bild.de/top.document.all. > > > > Status: 3 (db_gone) > > > > http://tv.bild.de/+escape(document.referrer)+ > > > > Status: 3 (db_gone) > > > > http://tv.bild.de/_js/+escape(document.referrer)+ > > > > Status: 3 (db_gone) > > > > http://tv.bild.de/_js/... > > > > Status: 3 (db_gone) > > > > http://tv.bild.de/_js/1.5.1.1 > > > > Status: 3 (db_gone) > > > > http://tv.bild.de/_js/</tbody></+escape(document.referrer)+ > > > > Status: 3 (db_gone) > > > > http://tv.bild.de/_js/</tbody></bild//CP//+escape(document.referrer)+ > > > > Status: 3 (db_gone) > > > > http://tv.bild.de/_js/ > > </tbody></bild//CP//bild//CP//+escape(document.referrer)+ > > > > Status: 3 (db_gone) > > > > http://tv.bild.de/_js/ > > > </tbody></bild//CP//bild//CP//entertainment/body/tv/tvprogramm/tvprogramm/home/+escape(document.referrer)+ > > > > Status: 3 (db_gone) > > > > > > >
