2009/8/19 Bertrand Delacretaz <bdelacre...@apache.org>: > On Mon, Aug 17, 2009 at 3:26 PM, Dave Pawson<dave.paw...@gmail.com> wrote: >> 2009/8/17 Bertrand Delacretaz <bdelacre...@apache.org>: >> >>> You might want to have a look at Apache Solr >>> (http://lucene.apache.org/solr/)...
>> Cons >> Server based (I don't like 'playing' over http).. > > Ok, didn't get that from your earlier mail. Not sure how/if any of > Solr's stuff is available from a pure command line. Just one less source of problems when debugging! > >> Is it single schema? Seems to be from the FAQ... > > AFAIK Solr only cares about its own schema internally, you might have > to define conventions for field names to "attach" them to a specific > schema. You'd need a custom transformation from your XML instances to > the Solr "fields" schema to each of your schemas. Which seems illogical? I'm suggesting developing something such that any user can process their own XML using existant element names without a double transform and possible loss of semantics. > >> ...They've extended lucene. Mmm... > > The overlap between the Lucene and Solr communities is important, I > think it's safe to say that they work in close collaboration. good to know! > >> ...Uses POST (why not PUT?)... > > Probably to be browser-friendly, I see your point. RESTful. > >> ...Not much in http://wiki.apache.org/solr/TaskList >> about better XML support.... > > AFAIK their current XML format serves their needs. <grin/> But I'm thinking of a users need, mine in this case. > >> ...Still a better bet than tika Bertrand?... > > I would use Solr if I had to index XML. Well, today I would import the > data into Jackrabbit maybe ;-) jackrabbit == red herring? :-) Focus on content? I guess MarkLogic would be a better bet if I'm going that direction. It's the search capabilities I'm most interested in, hence the Lucene kick. > > I'd say Solr is more out-of-the-box when it comes to indexing, as all > the Lucene bits (analzers, boosts etc.) are taken care of. But it's > hard to compare with Tika as the scope is totally different. I've used tika with my xml, just that I'm short of the semantics that my markup has, which would be good for weighting. regards -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk