Hi Bai,
This was a workaround I thought about. The problem with this is though that
I have nearly a TB of docs on disk and moving them over is time trivial...
also the workaround is annoying knowing that we have a protocl-file plugin.
Thanks for help
Lewis

On Wednesday, August 7, 2013, Bai Shen <baishen.li...@gmail.com> wrote:
> Is it possible to run a web server and connect to them that way?  That was
> what I ended up doing.
>
>
> On Tue, Aug 6, 2013 at 4:58 PM, Lewis John Mcgibbney <
> lewis.mcgibb...@gmail.com> wrote:
>
>> Hi,
>> Struggling with this one. And yes I acknowledge that it is not really a
>> Nutch based question but hopefully someone can help...
>> I have a directory path as follows
>>
>> /media/FreeAgent\ GoFlex\ Drive/trec_fedweb/
>> bookstore.ewi.utwente.nl/fedweb13/FW13-sample-docs/e001/
>>
>> the directory e001 contains a pile of HTML as do its next door neighbours
>> within the FW13-sample-docs/ directory. I need to crawl these independent
>> on each other and send them to separate Solr cores.
>> Does someone know how to map the above path to regex-urlfilter and even a
>> seed.txt file?
>> Thanks v much in advance for any help.
>> Lewis
>> --
>> *Lewis*
>>
>

-- 
*Lewis*

Reply via email to