Hi Jim,

The thing about this problem is that I assume religion information
would not be included in the document metadata therefore it's not a
simple case of using one of the existing implementations e.g.
parse-metatags to grab this data...

I think it would be something more a long the lines of text processing
post (or @runtime) fetching. Documents could then be classified
accordingly. I recently spoke with someone who undertook such an
exercise but not using Nutch I must admit. If you are familar with
GATE [0] you could create some kind of plugin to identify this kind of
information but I am not familiar with the process of retaining it for
indexing as I have not thoroughly tried the concept.

hth

Lewis

[0] http://gate.ac.uk/

On Fri, Jun 29, 2012 at 12:36 PM, Jim Chandler <[email protected]> wrote:
> Lewis,
>
> I work with George.  What we are trying to do is identify whether or not a
> document is religious in nature or not.  And if so what that religion is.
>  We are aware this could be a difficult undertaking, and we would like not
> to reinvent the wheel.
>
> HTH
> Jim
>
> On Thu, Jun 28, 2012 at 5:16 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
>> Hi George,
>>
>> Where are each of these fields present within the document?
>>
>> Lewis
>>
>> > On Wed, Jun 27, 2012 at 7:59 PM, JAB <[email protected]>
>> wrote:
>> >> I've written some simple Nutch plug-ins to detect a document's Author,
>> >> Publication Date, and if its an article about Religion (including what
>> >> religion its talking about). I was wondering if anyone knows of any open
>> >> source plug-ins any group has written to cover these plug-in issues,
>> rather
>> >> than me relying on my own custom solutions. I'm new to Nutch/Gate
>> >> development.
>> >>
>> >> --
>> >> View this message in context:
>> http://lucene.472066.n3.nabble.com/Nutch-Author-Publication-and-Religion-Detection-tp3991662.html
>> >> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>> >
>> >
>> >
>> > --
>> > Lewis
>>
>>
>>
>> --
>> Lewis
>>



-- 
Lewis

Reply via email to