Re: Nutch debugging log in Tomcat run time

2005-09-06 Thread Jérôme Charron
The change doesn't reflect in the screen after I re-compile the Nutch code and re-launch the tomcat. Do you re-deploy the web app? -- http://motrech.free.fr/ http://www.frutch.org/

Naming of lib-plugins, was: AW: MS related plugins refactoring

2005-09-06 Thread Strittmatter, Stephan
Hello, Here is a copy of my previous mail, if someone want to comment it: I have just committed some modifications that enable to have some dependencies between plugins. I would like to apply this mechanism to parse-ms* related plugins that both uses jakarta poi code. The idea is: instead

Plugins dependencies enhancement proposal

2005-09-06 Thread Jérôme Charron
Since the plugins can specify some dependencies each over, it raises an administrator problem. For a Nutch administrator, it is not user-friendly to specify which plugins to activate/deactivate. With plugin inter-dependencies, the administrator need to know that a plugin depends on another one

howto skip hiddens ulrs inside div tag?

2005-09-06 Thread Massimo Miccoli
Hi nutch dev, After fetching about 100 mio of pages I see many search engine spammers that use an hidden div tag (negative position) to include many urls that user don't see whe acces the site page. This links alter the boost (by inlink count) so I want to skip this urls. How can I do that?

Re: Plugins dependencies enhancement proposal

2005-09-06 Thread Stefan Groschupf
+1! Am 06.09.2005 um 11:41 schrieb Jérôme Charron: Since the plugins can specify some dependencies each over, it raises an administrator problem. For a Nutch administrator, it is not user-friendly to specify which plugins to activate/deactivate. With plugin inter-dependencies, the

Re: Plugins dependencies enhancement proposal

2005-09-06 Thread Dawid Weiss
This idea calls for follow ups -- with plugins that depend on each other it's just a step towards _order dependence_ (some plugins must be activated before other plugins, some depend on the status of the plugin activation, etc). This in fact resembles ANT's target dependency system; one

Re: Delete an entry in ArrayFile/MapFile

2005-09-06 Thread Piotr Kosiorowski
Hello, You cannot do it. These structures where not designed for it. But you can copy all the data to other ArrayFile skipping entries you want to delete. Regards Piotr On 9/6/05, Ben [EMAIL PROTECTED] wrote: Hi How can I delete an entry in the ArrayFile/MapFile if I know the id/key?

Re: MS related plugins refactoring

2005-09-06 Thread support
Jérôme, You may should discuss such things before you 'committed' a new feature that already exists. I normally ready most of the nutch mails. What was the date and subject? I may overseen this one. Stefan

linksByMD5

2005-09-06 Thread Handl, Jorge
Hi! I'm writing a webdb purger, and I have an issue with writing to the new db the links of the pages that haven't been purged. The docs seem to imply that adding a link having a source page that is not present in the webdb should fail, but apparently it doesn't. So I try to filter out the

Re: MS related plugins refactoring

2005-09-06 Thread Jérôme Charron
You may should discuss such things before you 'committed' a new feature that already exists. I normally ready most of the nutch mails. What was the date and subject? I may overseen this one. I don't know, it's Stefan's sentence, not mine, so, please ask to Stefan. Regards Jérôme --

Re: Help for regex

2005-09-06 Thread Fredrik Andersson
Hello Massimo. *-.*-.*-.* would match anything with three dashes or more in it, for instance. Another more good-looking way would be to use something like .*(-.*){a,b}, which will match anything with a number of dashes b. Fredrik On 9/6/05, Massimo Miccoli [EMAIL PROTECTED] wrote: Hi,

RE: linksByMD5

2005-09-06 Thread Handl, Jorge
A bit more info: The addLink documentation: Links are only permitted in the webdb if they have a valid source MD5 for a Page that is also in the webdb. Yet I can insert a link with the MD5 of a page that is not in the webdb. Also, I can now filter out the offending links by reading both the