Re: Nutch 2.1 different batch id (null)

Lewis John Mcgibbney Fri, 15 Feb 2013 07:51:02 -0800

And you want to get to the bottom of the batchId = null?
You haven't actually asked a question.here.


On Thursday, February 14, 2013, Dragan Menoski <[email protected]>
wrote:
> Hi,
> I try to set Nutch 2.1 and Solr 4.0 with MySQL database, according to the
instruction in this link: http://nlp.solutions.asia/?p=180.
> I made same changes in conf/nutch-site.xml (set threads to 50).
> When I start crawl (path: ~/Desktop/apache-nutch-2.1/runtime/local,
command: bin/nutch crawl urls -depth 5 -topN 1) I saw the message:
"Skipping http://www.domainname.com/category/viewvideo/111; different batch
id (null)" for a lot of pages.
> My nutch-site.xml file is in attach.
> I use Debian 6.0.5 (x64) on Virtual Machine on Windows 7 (x64).
> I have many records in database with: headers = null, status = 1, text =
null and the others fields are also null.
> In conf/regex-urlfilter.txt I have:
> # accept anything else
> +^http://([a-z0-9]*\.)*www.domain01.com
> +^http://([a-z0-9]*\.)*domain02.com
> +^http://([a-z0-9]*\.)*www.domain03.com.mk
> In /root/Desktop/apache-nutch-2.1/runtime/local/urls/seed.txt I have:
> http://www.domain01.com
> http://domain02.com
> http://www.domain03.com.mk
>
>
> Best Regards,
> Dragan Menoski

-- 
*Lewis*

Re: Nutch 2.1 different batch id (null)

Reply via email to