Thought I just missed something. Okay, I just added a few patterns as
well as a commandline-checker. See

http://issues.apache.org/jira/browse/NUTCH-279

for the patch.


Regards,
 Stefan

TDLN wrote:
> Sorry, I was a bit too fast there, the answer applies to the
> RegexURLFilter not the RegexUrlNormalizer. I don't think there is a
> similar facility for the RegexUrlNormalizer, but let me know if you
> find it :)
> 
> Rgrds, Thomas
> 
> On 5/22/06, TDLN <[EMAIL PROTECTED]> wrote:
>> Hi Stefan
>>
>> try running bin/nutch org.apache.nutch.net.URLFilterChecker
>>
>> Rgrds, Thomas
>>
>> On 5/22/06, Stefan Neufeind <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > is there a way to debug rules for RegexUrlNormalizer, e.g. test the
>> > substitution from commandline?
>> >
>> >
>> >         bin/nutch org.apache.nutch.net.RegexUrlNormalizer
>> >
>> > does print out the rules it uses. But afaik there is no such thing
>> > possible as
>> >
>> > echo "http://www.example.com"; | bin/nutch
>> > org.apache.nutch.net.RegexUrlNormalizer
>> >
>> > is there? So how do you debug rules when writing new ones and testing
>> > them against a set of URLs that should match / should not match?


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to