mhmmm got it... Tejas can you please explain to me why I put some URL inside urls/seed.txt and many pages inside that urls aren't parsed?
Example: Skipping http://wiki.creativecommons.org/Integrate; different batch id (null) Skipping http://wiki.creativecommons.org/LRMI; different batch id (null) Skipping http://wiki.creativecommons.org/Marking; different batch id (null) This pages are example of many others pages that aren't parsed. Like that, there are many other pages that I wanted to be read and recorded in the database. Thanks again. On Thu, Jun 13, 2013 at 6:04 PM, Tejas Patil <[email protected]>wrote: > Those are all images which wont get parsed by Nutch. > > > On Thu, Jun 13, 2013 at 1:33 PM, Weder Carlos Vieira < > [email protected] > > wrote: > > > > > I extracted 1 row of this urls returned... > > > > It attached in excel format. > > > > > > >

