mhmmm got it...

Tejas can you please explain to me why I put some URL inside urls/seed.txt
and many pages inside that urls aren't parsed?

Example:
Skipping http://wiki.creativecommons.org/Integrate; different batch id
(null)
Skipping http://wiki.creativecommons.org/LRMI; different batch id (null)
Skipping http://wiki.creativecommons.org/Marking; different batch id (null)

This pages are example of many others pages that aren't parsed.
Like that, there are many other pages that I wanted to be read and recorded
in the database.


Thanks again.



On Thu, Jun 13, 2013 at 6:04 PM, Tejas Patil <[email protected]>wrote:

> Those are all images which wont get parsed by Nutch.
>
>
> On Thu, Jun 13, 2013 at 1:33 PM, Weder Carlos Vieira <
> [email protected]
> > wrote:
>
> >
> > I extracted 1 row of this urls returned...
> >
> > It attached in excel format.
> >
> >
> >
>

Reply via email to