Google has sitemaps instead... initially designed to help finding such dynamic URLs (not necessarily built by JavaScript; could be form submission)
Evaluation of JavaScript is extremely CPU-costly for crawlers (it isn't personal computer where you have single JavaScript thread for double-cores!) - especially if you need to execute 1000s "use cases" (method parameters' combinations) in order to find all possible return values... Google may use some JavaScript emulations (sometimes!) in order to find black-hat-SEOs etc, and to evaluate some landing pages quality for AdWords (do they use AdSense?) - but it is not a job of Googlebot... Just generate 'sitemap' (seed.txt file) for Nutch... > -----Original Message----- > From: Mohamed Parvez [mailto:par...@gmail.com] > Sent: September-14-09 12:36 PM > To: nutch-user@lucene.apache.org > Subject: Re: URL built by JavaScript Function - Can this be Crawled > > Thanks ken. > If Google itself has not fully implemented, JavaScript analysis/execution > for crawling > I am going to stay away from it and look for alternate solution. > > Thanks/Regards, > Parvez > > > > On Mon, Sep 14, 2009 at 11:15 AM, Ken Krugler > <kkrugler_li...@transpac.com>wrote: > > > JavaScript code that creates dynamic URLs is always a problem for web > > crawlers. > > > > Most web sites try to make their content crawlable by creating alternative > > static links to the content. > > > > I think Google now does some analysis/execution of JS code, but it's a > > tricky problem. > > > > I would suggest modifying the HTML parser to explicitly look for calls > > being made to your function, and generate appropriate outlinks. > > > > -- Ken > > > > > > > > On Sep 14, 2009, at 8:04am, Mohamed Parvez wrote: > > > > Can anyone please through some light on this > >> > >> Thanks/Regards, > >> Parvez > >> > >> > >> On Fri, Sep 11, 2009 at 3:23 PM, Mohamed Parvez <par...@gmail.com> wrote: > >> > >> We have a JavaScript function, which takes some prams and builds an URL > >>> and > >>> then uses window.location to send the user to that URL. > >>> > >>> Our website uses this feature a lot and most of the urls are built using > >>> this function. > >>> > >>> I am trying to crawl using Nutch and I am also using the parse-js plugin. > >>> > >>> But it does not look like Nautch is able to crawl these URLs. > >>> > >>> Am I doing something wrong or Nutch is not able to crawl URLs build by > >>> JavaScript function. > >>> > >>> ---- > >>> Thanks/Regards, > >>> Parvez > >>> > >>> > >>> > > -------------------------- > > Ken Krugler > > TransPac Software, Inc. > > <http://www.transpac.com> > > +1 530-210-6378 > > > >