Nutch always crawls from from a parsed file to the urls contained in the file. However, if we want to crawl a specific type of files (e.g. rss file), there may be some difficulties. As the links to real rss files are always contained in some entry files of html/htm, so there is no direct urls from one rss file to another. If we want to index rss files, we have to index many html/htm files first.
-- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。
