look here, it is blocking robots: http://ulysses.wyona.org/robots.txt

User-agent: *
Disallow: /foo/bar.html

User-agent: lenya
Disallow: /foo/bar.html





Michael Wechner wrote:

Hi

I am trying to index http://ulysses.wyona.org/ but somehow it just indexes the homepage but doesn't seem to follow any links. I have set "depth 3" and other sites are being crawled deeper without a problem but not the Ulysses page.

Has anyone made similar experiences?

Is it possible that Nutch has problem with well-formed XHTML (application/xhtml+xml)?

Thanks

Michi

Reply via email to