Thank you. But what I want to crawl are just from the internent and certainly I can't control them.
2006/2/10, Vanderdray, Jacob <[EMAIL PROTECTED]>: > > If you control the "temporary links" pages, then just add a > robots meta tag. Take a look at > http://www.robotstxt.org/wc/meta-user.html to see what your options are. > > Jake. > > -----Original Message----- > From: Elwin [mailto:[EMAIL PROTECTED] > Sent: Friday, February 10, 2006 4:38 AM > To: [email protected] > Subject: How to control contents to be indexed? > > In the process of crawling and indexing, some pages are just used as > "temporary links " to the pages I want to index, so how can I control > those > kinds of pages not being indexed? Or which part of nutch should I > extend? > -- 《盖世豪侠》好评如潮,让无线收视居高不下, 无线高兴之余,仍未重用。周星驰岂是池中物, 喜剧天分既然崭露,当然不甘心受冷落,于是 转投电影界,在大银幕上一展风采。无线既得 千里马,又失千里马,当然后悔莫及。
