Hi,

I am using nutch for a focused crawl vertical search engine, so far I
am only extracting information to be stored in the index in the crawl
process. However I would like to allow users to edit and extend the
content showed on my site. Like adding a better description, adding
tags and sorting items into categories.

What would be the best approach to do that? If I simply store the
additional information in the index what happens next time when a page
is re indexed? Would the user generated content be overwritten? If so
what would be the best way to prevent that? creating a solr pluggin
(that would not re index documents that have been modified externally)
or shhould I maybe store the user generated content in a database
instead and flash the index with the information from the database
after each crawl if changed? Something completely different?

Are there already some plugins for nutch or solr to do something like this?

Any thoughts and / or best practices on this would be greatly appreciated :)

best regards,
Magnus

Reply via email to