I don't see how that's possible unless you improve the javascript parser. I
actually think it's pretty much impossible to get links properly from
javascript unless the script is actually interpreted and executed, which is
a much different task than what the parser plugin does.

On Fri, Oct 10, 2008 at 1:17 AM, Höchstötter Nadine <
[EMAIL PROTECTED]> wrote:

> Thank you for your answer. But, when I override the Javascript plugin,
> there will be links missing. Is there any possibility to get those
> javascript urls?
> Thanks.
>
> -----Ursprüngliche Nachricht-----
> Von: Kevin MacDonald [mailto:[EMAIL PROTECTED]
> Gesendet: Donnerstag, 9. Oktober 2008 19:26
> An: [email protected]
> Betreff: Re: db_gone/javascript/invalid URLs
>
> I encountered that error as well. I believe it's happening because the
> javascript parser is trying to pull valid urls out of javascript, which is
> highly optimistic considering that such urls may be getting pieced together
> using string appends. I would override the 'plugin.includes' config value
> in
> nutch-default.xml (by placing it in nutch-site.xml) and turn off javascript
> parsing.
>
> On Thu, Oct 9, 2008 at 8:13 AM, Höchstötter Nadine <
> [EMAIL PROTECTED]> wrote:
>
> > Hi all,
> > I have a problem with javascript. I tried to crawl bild.de and I got
> many
> > links not having been fetched. I got the stats and they mostly say
> "Status
> > 3:  (db_gone)". With a look at those urls entitled "db_gone" you will see
> > some weird things as listed below the email. I just listed a few. I do
> not
> > think that this is only a javascript problem but probably also  a url
> > normalization problem. Does anybody know how to deal with it? Thanks,
> > Nadine.
> >
> > http://software.bild.de/js/6M/x-6N-6Q-6T
> >
> > Status: 3 (db_gone)
> >
> > http://software.bild.de/js/;l(6.1f(7,
> >
> > Status: 3 (db_gone)
> >
> > http://software.bild.de/js/</22>
> >
> > Status: 3 (db_gone)
> >
> > http://software.bild.de/js/</4t></29></22>
> >
> > Status: 3 (db_gone)
> >
> > http://software.bild.de/js/a.1i
> >
> > Status: 3 (db_gone)
> >
> > http://software.bild.de/js/},4o:q(){6(7)[6(7).4E(
> >
> > Status: 3 (db_gone)
> >
> > http://software.bild.de/ratgeber-karriere/jobs/allgemein
> >
> > Status: 3 (db_gone)
> >
> > http://software.bild.de/text/javascript
> >
> > Status: 3 (db_gone)
> >
> > http://software.bild.de/top.document.all.
> >
> > Status: 3 (db_gone)
> >
> > http://tv.bild.de/+escape(document.referrer)+
> >
> > Status: 3 (db_gone)
> >
> > http://tv.bild.de/_js/+escape(document.referrer)+
> >
> > Status: 3 (db_gone)
> >
> > http://tv.bild.de/_js/...
> >
> > Status: 3 (db_gone)
> >
> > http://tv.bild.de/_js/1.5.1.1
> >
> > Status: 3 (db_gone)
> >
> > http://tv.bild.de/_js/</tbody></+escape(document.referrer)+
> >
> > Status: 3 (db_gone)
> >
> > http://tv.bild.de/_js/</tbody></bild//CP//+escape(document.referrer)+
> >
> > Status: 3 (db_gone)
> >
> > http://tv.bild.de/_js/
> > </tbody></bild//CP//bild//CP//+escape(document.referrer)+
> >
> > Status: 3 (db_gone)
> >
> > http://tv.bild.de/_js/
> >
> </tbody></bild//CP//bild//CP//entertainment/body/tv/tvprogramm/tvprogramm/home/+escape(document.referrer)+
> >
> > Status: 3 (db_gone)
> >
> >
> >
>

Reply via email to