Hi,

No. I don't have that enabled. The logging level is INFO for others.

-Jiang

On Fri, Jun 29, 2012 at 5:32 PM, Ferdy Galema <[email protected]> wrote:
> A quick pointer:
>
> Do you have trace logging enabled? If so try to disabled and see if that
> works.
> See https://issues.apache.org/jira/browse/NUTCH-1253
>
>
> On Fri, Jun 29, 2012 at 11:17 AM, Jiang Fung Wong
> <[email protected]>wrote:
>
>> Dear All,
>>
>> I have this scenario, where I need to initialize an HtmlUnit (a
>> browser for scraping) web client inside a nutch plugin code. The code
>> is (in clojure)
>>
>> (defn parser-filter
>> "Called by nutch to perform the parsing. Implementation of
>> org.apache.nutch.parse.HtmlParseFilter.filter"
>> [this content parse-result meta-tags doc]
>>
>> (println "testing 123")
>>
>> (try
>>
>>  (doto (new WebClient)
>>              (.setJavaScriptEnabled true)
>>              (.setThrowExceptionOnFailingStatusCode false)
>>              (.setThrowExceptionOnScriptError false))
>>
>>
>> (catch Exception e
>>
>>    (println "caught")
>>    (throw e)))
>>
>> (println "ending testing 123")
>>
>> ...................
>>
>>
>> WebClient class comes from [com.gargoylesoftware.htmlunit WebClient].
>> I believe it is an Apache's http client. I found that the program
>> encountered exception inside the try block, yet the exception was not
>> caught.
>>
>>
>> The output from nutch:
>>
>> testing 123
>> Parsing: http://sg.news.yahoo.com/
>> Error parsing: http://sg.news.yahoo.com/: failed(2,200):
>> org.apache.nutch.parse.ParseException: Unable to successfully parse
>> content
>> ParseSegment: finished at 2012-06-29 09:16:31, elapsed: 00:00:07
>>
>> Neither "caught" nor "ending testing 123" was not printed out.
>>
>> Any idea?
>>
>>
>> -Jiang
>>

Reply via email to