Hi Bai, This was a workaround I thought about. The problem with this is though that I have nearly a TB of docs on disk and moving them over is time trivial... also the workaround is annoying knowing that we have a protocl-file plugin. Thanks for help Lewis
On Wednesday, August 7, 2013, Bai Shen <baishen.li...@gmail.com> wrote: > Is it possible to run a web server and connect to them that way? That was > what I ended up doing. > > > On Tue, Aug 6, 2013 at 4:58 PM, Lewis John Mcgibbney < > lewis.mcgibb...@gmail.com> wrote: > >> Hi, >> Struggling with this one. And yes I acknowledge that it is not really a >> Nutch based question but hopefully someone can help... >> I have a directory path as follows >> >> /media/FreeAgent\ GoFlex\ Drive/trec_fedweb/ >> bookstore.ewi.utwente.nl/fedweb13/FW13-sample-docs/e001/ >> >> the directory e001 contains a pile of HTML as do its next door neighbours >> within the FW13-sample-docs/ directory. I need to crawl these independent >> on each other and send them to separate Solr cores. >> Does someone know how to map the above path to regex-urlfilter and even a >> seed.txt file? >> Thanks v much in advance for any help. >> Lewis >> -- >> *Lewis* >> > -- *Lewis*