Re: Boilerpipe and Nutch 2.x ?

2012-09-10 Thread Ferdy Galema
Hi, To be absolutely sure that only Tika is used you should also remove the parse-html plugin from plugin.includes. Make sure all references to the parse-html plugin are removed from the parse-plugins.xml. (Looking at your snippet it seems as this is the case). With Tika itself or Boilerpipe I'm

Re: Escaping URL during redirection

2012-09-10 Thread remi tassing
Sorry, I think it works. I was trying 'parsechecker' and it doesn't apply 'regexnormalizer' rules by default. So, case solved, thanks a lot! On Sunday, September 9, 2012, Sebastian Nagel wrote: Redirects are filtered and normalized. It works for 1.4/1.5 and should for trunk. One subtlety:

un-subscribe me

2012-09-10 Thread IGM Networks - Vasilis Pasparas
please un-subscribe me

Re: nutch crawling file system SOLVED

2012-09-10 Thread dpverma
Can you pls let me know how you solved your problem? I am also getting the same error which you had. Getting the index with pdf's file name but not the content in those -- View this message in context: