Hi Zabini,

I'm a little unclear if you are having a problem with nutch following the
links or indexing the pages.  Have you tried both of these to verify the
links and index data?

https://wiki.apache.org/nutch/bin/nutch%20parsechecker
https://wiki.apache.org/nutch/bin/nutch%20indexchecker

The second link above seems wrong to me, it shows *IndexingFiltersChecker* but
I think it should be *indexchecker*.  That works for me.


On Wed, Apr 16, 2014 at 11:48 AM, Zabini <[email protected]>wrote:

> Hi,
>
> I am facing a problem with the urls nutch fetch.
>
> I have a page and whithin several URLs, but Nucth does not fetch them.
> They are allowed in the regex-urlfilter and those URLs works fine if I put
> them in my urls seed list.
>
> Does anyone has any hint on what to do?
>
> Best Regards,
> Zabini
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Don-t-fetch-all-urls-in-a-page-tp4131531.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Reply via email to