Hi Jim, The thing about this problem is that I assume religion information would not be included in the document metadata therefore it's not a simple case of using one of the existing implementations e.g. parse-metatags to grab this data...
I think it would be something more a long the lines of text processing post (or @runtime) fetching. Documents could then be classified accordingly. I recently spoke with someone who undertook such an exercise but not using Nutch I must admit. If you are familar with GATE [0] you could create some kind of plugin to identify this kind of information but I am not familiar with the process of retaining it for indexing as I have not thoroughly tried the concept. hth Lewis [0] http://gate.ac.uk/ On Fri, Jun 29, 2012 at 12:36 PM, Jim Chandler <[email protected]> wrote: > Lewis, > > I work with George. What we are trying to do is identify whether or not a > document is religious in nature or not. And if so what that religion is. > We are aware this could be a difficult undertaking, and we would like not > to reinvent the wheel. > > HTH > Jim > > On Thu, Jun 28, 2012 at 5:16 PM, Lewis John Mcgibbney < > [email protected]> wrote: > >> Hi George, >> >> Where are each of these fields present within the document? >> >> Lewis >> >> > On Wed, Jun 27, 2012 at 7:59 PM, JAB <[email protected]> >> wrote: >> >> I've written some simple Nutch plug-ins to detect a document's Author, >> >> Publication Date, and if its an article about Religion (including what >> >> religion its talking about). I was wondering if anyone knows of any open >> >> source plug-ins any group has written to cover these plug-in issues, >> rather >> >> than me relying on my own custom solutions. I'm new to Nutch/Gate >> >> development. >> >> >> >> -- >> >> View this message in context: >> http://lucene.472066.n3.nabble.com/Nutch-Author-Publication-and-Religion-Detection-tp3991662.html >> >> Sent from the Nutch - Dev mailing list archive at Nabble.com. >> > >> > >> > >> > -- >> > Lewis >> >> >> >> -- >> Lewis >> -- Lewis

