thank you sroy,

as i wrote to ken, i don't clearly understand regex in this case.
with your regex suggestion i get now error-log:

Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)


i am using nutch-0.9 on redhat.

and there is no problem with url like 
+^http://([a-z0-9]*\.)*website.com/known-folder/known-folder/
 
any other suggestions?

regards,
mailusenet



________________________________
Von: Subhojit Roy <mails...@gmail.com>
An: nutch-user@lucene.apache.org
Gesendet: Donnerstag, den 19. November 2009, 10:13:12 Uhr
Betreff: Re: substitute unknown parts of the url

Hi,

Try the regular expression below.

+^http://([a-z0-9]*\.)*website.com/*[a-z0-9]**/known-folder/

-sroy


On Thu, Nov 19, 2009 at 6:23 AM, Myname To <mailuse...@yahoo.de> wrote:

> hello
>
> can somebody help me with urlfilter. i need to fetch sites with this
> pattern:
>
> http://([a-z0-9]*\.)*website.com/unknown-folder/known-folder/
>
> first folder can vary, whereas host name and second folder are known.
>
> how can i substitute unknown parts (folders) of the url?
>
> any help appreciated!
>
> regards
> mailusenet
>
>
>




-- 
Subhojit Roy
Profound Technologies
(Search Solutions based on Open Source)
email: s...@profound.in
http://www.profound.in


__________________________________________________
Do You Yahoo!?
Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen 
Massenmails. 
http://mail.yahoo.com 

Reply via email to