as far as I can see, nutch does not index any html meta-tags like description or keywords. Does anybody know the reason for this?

I'm not sure why Nutch doesn't do it, but a lot of search engines
stopped using those for scoring because they were abused by
spam sites that would stuff them with keywords.

If you really want it, it's not too difficult. Just copy the
index-basic plugin and add some code to index it:

   String desc = metadata.getProperty("description");
   String keywords = metadata.getProperty("keywords");

  doc.add(Field.Text("content", description));
  doc.add(Field.Text("content", keywords));

  // Or you could add your own fields, but you'll have to
  // change your query filters to pick them up:

  doc.add(Field.Text("description", description));
  doc.add(Field.Text("keywords", keywords));


Reply via email to