date:20110721

Re: skipping invalid segments nutch 1.3

2011-07-21 Thread lewis john mcgibbney

Hi Leo, From the times both the fetching and parsing took, I suspecting that maybe Nutch didn't actually fetch the URL, however this may not be the case as I have nothing to benchmark it on. Unfortuantely on the occasion the URL http://wiki.apache.org actually redirects to

Re: skipping invalid segments nutch 1.3

2011-07-21 Thread Sebastian Nagel

Hi Leo, hi Lewis, From the times both the fetching and parsing took, I suspecting that maybe Nutch didn't actually fetch the URL, This may be the reason. Empty segments may break some of the crawler steps. But if I'm not wrong it looks like the updatedb-command is not quite correct:

Re: skipping invalid segments nutch 1.3

2011-07-21 Thread Leo Subscriptions

Hi Lewis, Will try your suggestion shortly, but am still puzzled why the crawl command works. Isn't it using the same filter, etc? Cheers, Leo On Thu, 2011-07-21 at 20:55 +0100, lewis john mcgibbney wrote: Hi Leo, From the times both the fetching and parsing took, I suspecting that maybe

Re: skipping invalid segments nutch 1.3

2011-07-21 Thread Leo Subscriptions

Hi Lewis, Following are the things I tried ans the relevant source/logs 1. ran 'crawl' without ending / in the url http://www.seek.com.au ; Result OK 2. ran 'crawl' with ending / in the url http://www.seek.com.au/ ; Result OK 3. Had a look at the regex-urlfilter.txt and the relevant entries

Re: skipping invalid segments nutch 1.3

2011-07-21 Thread Leo Subscriptions

Hi Sebastian, I think the problem is with the fetch not returning any results. I checked your suggestion, but it did not work. Cheers, Leo On Thu, 2011-07-21 at 22:16 +0200, Sebastian Nagel wrote: Hi Leo, hi Lewis, From the times both the fetching and parsing took, I suspecting that

Re: Configuration issue: Custom parser not being recognised.

2011-07-21 Thread amrutbudi...@gmail.com

Found the issue! plugin.xml defined extension id which didn't match id inside mimeType=application/xhtml+xml tag parse-plugins.xml. i.e.: below bold highlighted should match. plugin.xml: ?xml version=1.0 encoding=UTF-8? plugin id=food name=Food Parser. version=1.0.0

Re: skipping invalid segments nutch 1.3

Re: skipping invalid segments nutch 1.3

Re: skipping invalid segments nutch 1.3

Re: skipping invalid segments nutch 1.3

Re: skipping invalid segments nutch 1.3

Re: Configuration issue: Custom parser not being recognised.

6 matches

Site Navigation

Mail list logo

Footer information