I my experience, the hardest (but most flexible part) is exactly what was mentioned.. processing the data. Nutch does have a really easy plugin interface that you can use, and the example plugin is a great place to start. Once you have the raw parsed text, you can do what ever you want with it. For example, I wrote a plugin to add geospatial information to my NutchDocument. You then map the fields you added in the NutchDocument to something you want to have Solr index. In my case I created a geography field where I put lat, lon info. Then you create that same geography field in the nutch to solr mapping file as well as your solr schema.xml file. Then, when you run the crawl and tell it to use "solrindex" it will send the document to solr to be indexed. Since you have your new field in the schema, it knows what to do with it at index time. Now you can build a user interface around what you want to do with that field.
-- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Newbie-need-a-point-in-the-right-direction-tp2031381p2033687.html Sent from the Solr - User mailing list archive at Nabble.com.