Re: Buggy fetchlist' urls

2006-03-15 Thread Jack Tang
On 3/15/06, Jérôme Charron [EMAIL PROTECTED] wrote: I am not familiar with Rhino engine. But it is said jdk 6 adopted it as embeded javascript engine. Can we build one RhinoInterpreter first, and then evaluate the javascipt function to get the result rather than extracting pure text now.

Re: Buggy fetchlist' urls

2006-03-14 Thread Jack Tang
Hi Andrzej. In my previous projects, I bound javascript functions with center url. And I knew the idea does not fit for nutch. I am not familiar with Rhino engine. But it is said jdk 6 adopted it as embeded javascript engine. Can we build one RhinoInterpreter first, and then evaluate the

Re: Buggy fetchlist' urls

2006-03-14 Thread Florent Gluck
Hi Andrzej, Well, I think for now I'll just disable the parse-js plugin since I don't really need it anyway. I'll let you know if I ever work on it (I may need it in the future). Thanks, --Flo Andrzej Bialecki wrote: Florent Gluck wrote: Some urls are totally bogus. I didn't investigate

Re: Buggy fetchlist' urls

2006-03-14 Thread Jérôme Charron
I am not familiar with Rhino engine. But it is said jdk 6 adopted it as embeded javascript engine. Can we build one RhinoInterpreter first, and then evaluate the javascipt function to get the result rather than extracting pure text now. Hi Jack, I recently write a small article about search

Buggy fetchlist' urls

2006-03-13 Thread Florent Gluck
Hi, I'm using nutch revision 385671 from the trunk. I'm running it on a single machine using the local fileystem. I just started with a seed of one single url: http://www.osnews.com Then I ran a crawl cycle of depth 2 (generate/fetch/updatedb) and dumpped the crawl db. Here is where I got quite