This error comes up when the JavaScript is minified. Minification is just a simple process, where spaces are removed to make the JS small.
The parse-js plugin has no issue parsing a any JavaScript, but if the, same JavaScript, has its spaces removed, Nutch fails with the said error. Looks like it should be a simple fix. Thanks/Regards, Parvez On Fri, Sep 11, 2009 at 1:14 PM, Mohamed Parvez <par...@gmail.com> wrote: > I am getting this error : > -------------------------------- > fetching > http://business.verizon.net/SMBPortalWeb/resources/js/helpSupport.js > Error parsing: > http://business.verizon.net/SMBPortalWeb/resources/js/helpSupport.js: * > UNKNOWN!(-53,0):* Content not JavaScript: 'application/javascript' > > > I have this, In the file parse-plugins.xml : > --------------------------------------------------------- > <mimeType name="application/x-javascript"> > <plugin id="parse-js" /> > </mimeType> > > <mimeType name="application/javascript"> > <plugin id="parse-js" /> > </mimeType> > > > I have this, in the nutch-site.xml : > ------------------------------------------------ > <name>plugin.includes</name> > > <value>field-add|protocol-http|urlfilter-regex|parse-(text|html|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|parse-js|suffix-urlfilter</value> > </property> > > I am using the command : > ------------------------------------- > bin/nutch crawl urls -depth 10 >crawl.log > > > I am using this in the urls/seed.txt : > --------------------------------------------------- > > http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_nfpb=true&_pageLabel=SMBPortal_page_main_support > > Thanks/Regards, > Parvez > >