And you want to get to the bottom of the batchId = null? You haven't actually asked a question.here.
On Thursday, February 14, 2013, Dragan Menoski <[email protected]> wrote: > Hi, > I try to set Nutch 2.1 and Solr 4.0 with MySQL database, according to the instruction in this link: http://nlp.solutions.asia/?p=180. > I made same changes in conf/nutch-site.xml (set threads to 50). > When I start crawl (path: ~/Desktop/apache-nutch-2.1/runtime/local, command: bin/nutch crawl urls -depth 5 -topN 1) I saw the message: "Skipping http://www.domainname.com/category/viewvideo/111; different batch id (null)" for a lot of pages. > My nutch-site.xml file is in attach. > I use Debian 6.0.5 (x64) on Virtual Machine on Windows 7 (x64). > I have many records in database with: headers = null, status = 1, text = null and the others fields are also null. > In conf/regex-urlfilter.txt I have: > # accept anything else > +^http://([a-z0-9]*\.)*www.domain01.com > +^http://([a-z0-9]*\.)*domain02.com > +^http://([a-z0-9]*\.)*www.domain03.com.mk > In /root/Desktop/apache-nutch-2.1/runtime/local/urls/seed.txt I have: > http://www.domain01.com > http://domain02.com > http://www.domain03.com.mk > > > Best Regards, > Dragan Menoski -- *Lewis*

