[Nutch-general] RE: crawl by contentType and don't store data only build index

Gal Nitzan Sun, 19 Mar 2006 05:39:05 -0800

Hi,

There are few stages:


1. Set in nutch-site.xml the property: fetcher.store.content to false.
2. Write a parse filter which will set some metadata variables during parse 
stage like the description
3. Write a index filter which will add your description variable to the index 
(or replace the content field in doc to your variable)

If you will have many fields you will have to add also a query filter.

Gal.

-----Original Message-----
From: Ensheng Wang [mailto:[EMAIL PROTECTED] 
Sent: Sunday, March 19, 2006 2:49 PM
To: [email protected]
Subject: crawl by contentType and don't store data only build index

For example,I  only want to crawl .mp3 file on the internet, store the file 
description and url,and index that,don't want to store mp3 file data.
  How to do that?
  thanks!

__________________________________________________
赶快注册雅虎超大容量免费邮箱?
http://cn.mail.yahoo.com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] RE: crawl by contentType and don't store data only build index

Reply via email to