Re: file:/// URLS with spaces in path

2013-08-08 Thread Bai Shen
John Mcgibbney lewis.mcgibb...@gmail.com Sent: Wednesday 7th August 2013 16:52 To: user@nutch.apache.org Subject: Re: file:/// URLS with spaces in path Hi Bai, This was a workaround I thought about. The problem with this is though that I have nearly a TB of docs on disk and moving

Re: file:/// URLS with spaces in path

2013-08-07 Thread Bai Shen
Is it possible to run a web server and connect to them that way? That was what I ended up doing. On Tue, Aug 6, 2013 at 4:58 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi, Struggling with this one. And yes I acknowledge that it is not really a Nutch based question but

Re: file:/// URLS with spaces in path

2013-08-07 Thread Lewis John Mcgibbney
Hi Bai, This was a workaround I thought about. The problem with this is though that I have nearly a TB of docs on disk and moving them over is time trivial... also the workaround is annoying knowing that we have a protocl-file plugin. Thanks for help Lewis On Wednesday, August 7, 2013, Bai Shen

RE: file:/// URLS with spaces in path

2013-08-07 Thread Markus Jelsma
is separated by tabulator, not space. Cheers -Original message- From:Lewis John Mcgibbney lewis.mcgibb...@gmail.com Sent: Wednesday 7th August 2013 16:52 To: user@nutch.apache.org Subject: Re: file:/// URLS with spaces in path Hi Bai, This was a workaround I thought about

Re: file:/// URLS with spaces in path

2013-08-07 Thread Lewis John Mcgibbney
;) Cheers -Original message- From:Lewis John Mcgibbney lewis.mcgibb...@gmail.com Sent: Wednesday 7th August 2013 16:52 To: user@nutch.apache.org Subject: Re: file:/// URLS with spaces in path Hi Bai, This was a workaround I thought about. The problem with this is though that I

Re: file:/// URLS with spaces in path

2013-08-06 Thread Lewis John Mcgibbney
I'm using Nutch 2.3-SNAPSHOT HEAD I changed the location of the target directory to somewhere other than /media and tested... but I am actually getting same results so it is something to do with my config as oppose to the spaces in paths which I need to do. seed.txt --- A single entry