On 12/10/2012 15:27, Aleksandar Radovanovic wrote:
Thank you Nick. Could you possibly give me some more specific guidelines?

At the moment, all indexed words are "flat" with no semantics - which is
great for general purposes. However, if one focuses on, let's say
biomedical literature, one would like to distinguish what words
represent gene names, drugs names etc.. User would be able to compose
search like "[drug_dictionary_ID] AND headache" to get documents
containing all drug names related to headache.

First, create a schema with two full-text fields. One named "text" for the document content, and another one named "dict" for dictionary IDs. Then, before indexing a document, create a list of dictionary IDs related to that document. Store the IDs in the "dict" field separated by whitespace and index the document.

For the search part, you can write your own query parser, or use the excellent Search::Query module which supports the "field:value" syntax. Something like that should work:

my $parser = Search::Query->parser(
    dialect => 'Lucy',
    default_field => 'text',
);
my $query = $parser->parse('dict:drug_dictionary_ID AND headache');
my $lucy_query = $query->as_lucy_query();
my $hits = $lucy_searcher->hits( query => $lucy_query );

Hope this helps,

Nick

Reply via email to