Since I could not solve my problem I try to get help again. I searched the web, but could not find anything related to my problem. A mailing list post in [1] explaines the problem I try to solve, but I cannot find a solution for it.
To shortly repeat: i need to write a nutch plugin that can create from one crawled document two (or more), all having the same url and as content having part of the original document. Then I can proceed and parse/index the generated documents as usual. Can anyone guide me into the right direction? Where should I start to search? Classes, wikis, homepages, books? Nutch does a great job for what I need it now, but I think it lacks a bit of documentation, especially when it comes to plugin development. How would a bare-bones plugin look like? Is it even possible to modify this behaviour with a Nutch plugin? It is essential for my app, otherwise I need to switch technology... [1] http://osdir.com/ml/nutch-user.lucene.apache.org/2009-09/msg00117.html -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-hierarchical-data-schema-design-tp3052894p3077726.html Sent from the Nutch - User mailing list archive at Nabble.com.

