I'm new to Nutch and I'm trying to determine what fields are indexed after crawled pages are parsed. I am specifically looking to index image names/urls and <alt> data if present. I realize Nutch doesn't process images, but from what I've seen the entire text of a web page (including info between image tags) is indexed. Can I change the indexing to include "image" and "alt" fields?

Reply via email to