It's really hard to define what a blog is sometimes, but you could try to 
detect for an RSS/Atom feed present:
<link rel="alternate" type="application/rss+xml" title="RSS" 
or just detect for common signatures of blogging software. It would require 
some type of custom parser I would imagine.
----- Original Message -----
From: "Armando Gonçalves" <[email protected]>
To: [email protected]
Sent: Wednesday, February 4, 2009 9:02:24 PM GMT -08:00 US/Canada Pacific
Subject: Fetch only Blogs.

Can Anyone tell-me if there is a way of nutch just fetch blogs during the
crawl process???
My current application has a white list of domains, any better idea ?

-- 
Armando Gonçalves
C.C 2005-2

Reply via email to