A quick pointer: Do you have trace logging enabled? If so try to disabled and see if that works. See https://issues.apache.org/jira/browse/NUTCH-1253
On Fri, Jun 29, 2012 at 11:17 AM, Jiang Fung Wong <[email protected]>wrote: > Dear All, > > I have this scenario, where I need to initialize an HtmlUnit (a > browser for scraping) web client inside a nutch plugin code. The code > is (in clojure) > > (defn parser-filter > "Called by nutch to perform the parsing. Implementation of > org.apache.nutch.parse.HtmlParseFilter.filter" > [this content parse-result meta-tags doc] > > (println "testing 123") > > (try > > (doto (new WebClient) > (.setJavaScriptEnabled true) > (.setThrowExceptionOnFailingStatusCode false) > (.setThrowExceptionOnScriptError false)) > > > (catch Exception e > > (println "caught") > (throw e))) > > (println "ending testing 123") > > ................... > > > WebClient class comes from [com.gargoylesoftware.htmlunit WebClient]. > I believe it is an Apache's http client. I found that the program > encountered exception inside the try block, yet the exception was not > caught. > > > The output from nutch: > > testing 123 > Parsing: http://sg.news.yahoo.com/ > Error parsing: http://sg.news.yahoo.com/: failed(2,200): > org.apache.nutch.parse.ParseException: Unable to successfully parse > content > ParseSegment: finished at 2012-06-29 09:16:31, elapsed: 00:00:07 > > Neither "caught" nor "ending testing 123" was not printed out. > > Any idea? > > > -Jiang >

