Re: [lucy-user] Dictionary based NER with Lucy

Nick Wellnhofer Fri, 12 Oct 2012 07:11:36 -0700

On 12/10/2012 15:27, Aleksandar Radovanovic wrote:

Thank you Nick. Could you possibly give me some more specific guidelines?


At the moment, all indexed words are "flat" with no semantics - which is
great for general purposes. However, if one focuses on, let's say
biomedical literature, one would like to distinguish what words
represent gene names, drugs names etc.. User would be able to compose
search like "[drug_dictionary_ID] AND headache" to get documents
containing all drug names related to headache.

First, create a schema with two full-text fields. One named "text" forthe document content, and another one named "dict" for dictionary IDs.Then, before indexing a document, create a list of dictionary IDsrelated to that document. Store the IDs in the "dict" field separated bywhitespace and index the document.

For the search part, you can write your own query parser, or use theexcellent Search::Query module which supports the "field:value" syntax.Something like that should work:


my $parser = Search::Query->parser(
    dialect => 'Lucy',
    default_field => 'text',
);
my $query = $parser->parse('dict:drug_dictionary_ID AND headache');
my $lucy_query = $query->as_lucy_query();
my $hits = $lucy_searcher->hits( query => $lucy_query );

Hope this helps,

Nick

Re: [lucy-user] Dictionary based NER with Lucy

Reply via email to