On Fri, Dec 5, 2008 at 4:30 PM, Christof Mueller <[EMAIL PROTECTED]> wrote: > Jörn Kottmann wrote: >> I am also interested in a Lucene CAS consumer. >> Maybe we can work together and set up a sandbox project ? >> >> Jörn > Hi Jörn, > > we would be happy to contribute the code of the example Lucene CAS > consumer as base for the sandbox project. > > Christof >
I've got an index!!!! Yes, mixing some code from the JENA lucas (I kept it in a dust corner of my harddisk :) ), some from DK and some mine, i produce an index. If we want to start a Lucene indexer that's not only a proof of concept but something very useful, it should be configurable/exetendable. The "problem", that's the UIMA's power, is that everyone has it's own type system. To produce a lucene document one extract information from some features, applying the right analyzer. In my case I use maybe only 10% of the annotation produced by the analysis pipeline to produce a single lucene doc. So we need a very highly configurable component, able to map only certain declared features and applying the right analyzer and so on. Mny ways are possible: -completly programmatic: the indexer is abstract and should be extended to implement the right mapping for a specialized typeSytem and pipeline -configurable: mapping rules are defined in a descriptor file; the JENA component followed this way -mix of the two: some mapping is configured, other are implemented My 2€cents. Regards, Roberto -- Roberto Franchini http://www.celi.it http://www.blogmeter.it http://www.memesphere.it Tel +39-011-6600814 jabber:[EMAIL PROTECTED] skype:ro.franchini
