Finally, I have been able to run Lucy..:). Thanks a lot Peter for your help. Is there a way in Lucy to generate a tag map/cloud of specific types of terms/phrases that might be present in the search results/documents returned by Lucy for a particular query? For example, I want to generate a tag map showing all gene-names and also cell-tissue names with their document frequency (from their respective name-lists) that might be co-mentioned in the search results/documents returned by Lucy for a query gene (e.g. nuclear factor 1)? One other question, how can I change the default size of text excerpt reported in the search results? Thank you much. --- On Thu, 2/21/13, Peter Karman <[email protected]> wrote:
From: Peter Karman <[email protected]> Subject: Re: [lucy-user] Input format to Lucy To: [email protected] Date: Thursday, February 21, 2013, 2:55 PM Anil Pachuri wrote on 2/21/13 3:22 PM: > > > Hi, > > Does Lucy have a utility to accept raw XML files as input? I have 50 XML > files and I need to index selected fields in them using Lucy. > If you install SWISH::Prog::Lucy from CPAN, you get the swish3 tool installed which will index XML (and HTML et al) files for Lucy. You can specify which XML elements you want treated as Lucy fields with a configuration file. For example: # a document like <doc> <foo>bar</foo> </doc> # a config file like MetaNames foo PropertyNames foo # and then index the file like: % swish3 -F lucy -c configfile -i doc.xml # and search like: % swish3 -q foo:bar The configuration docs are at: http://swish-e.org/docs/swish-config.html You might also want to look at Dezi, which does the same thing with a server/client setup. http://dezi.org/ > Also, is there any general perl utility to merge multiple XML files or > convert these into tabular format? CPAN has many XML handling tools. I'm sure there's something there that will do most or all of what you want. -- Peter Karman . http://peknet.com/ . [email protected]
