Hi, I try to REcrawl (with a shell-script. I have already a webDB...) a
website (with some links to other webpage, .html, .doc, .pdf, ...) but this
error occured:
...
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20090703140431
You can use bin/hadoop fs -rmr crawl to delete the whole directory and
Recrawl.
On Tue, Jul 7, 2009 at 1:47 AM, Maurizio Croci croci.mauri...@gmail.comwrote:
Hi, I try to REcrawl (with a shell-script. I have already a webDB...) a
website (with some links to other webpage, .html, .doc, .pdf,
Hello,
I think you need to check
/etc/default/tomcat6
for
TOMCAT5_SECURITY
Alex McLintock schrieb:
OK, here is how i fixed this in my ubuntu 9.04 setup using the normal
tomcatt6 ubuntu package.
I added this permission line to the /etc/tomcat6/policy.d/04webapps.policy
grant {
// Attempt to
I've looked at a lot of tutorials for linking Nutch and Solr but it
seems that this has been improved a lot in version 1.0.
Can anyone point me at documentation which takes this into account?
Cheers
Alex