Would be cool to have a nifs integration of Apache Lucy. It may solve the problem.
- benoit On Mon, Mar 28, 2011 at 5:17 PM, Olafur Arason <[email protected]> wrote: > I love the power of Lucene but it's not needed for many usecases > and can even be gutted like Cloudant is doing with their search > using the lexer from Lucene. > > But most of the time people need quick and dirty search and > even search integration with views. Then you would maybe have > a really simple lexer. And have it built in. If people need more > power they would use Lucene. > > It's like using a Ferrari to go to the store, it's cool but an overkill. > > Hope you keep up the good work, couchdb-lucene is really easy > to use. > > Regards, > Olafur Arason > > Ps I was talking to an NLP expert and I realize that there is so > much to searching. Especially doing it right that I think nobody > will be able to re-implement Lucene anytime soon. > > On Mon, Mar 28, 2011 at 14:30, Robert Newson <[email protected]> wrote: >> I am a CouchDB committer and author of couchdb-lucene. :) >> >> B. >> >> On 28 March 2011 10:44, Andrew Stuart (SuperCoders) >> <[email protected]> wrote: >>> Hi Robert >>> >>> "there are no publicly known plans to build a native full-text indexing >>> feature for CouchDB." >>> >>> I don't know who is who around here as yet - are you commenting from inside >>> knowledge or as an end user/developer? >>> >>> Thanks >>> >>> >>> On 28/03/2011, at 8:24 PM, Robert Newson wrote: >>> >>> I have to dispute "There does not seem to be much understanding that >>> this could be a killer feature." >>> >>> Obviously full-text search is a killer feature, but it's trivially >>> available now via couchdb-lucene or elasticsearch. >>> >>> What people are asking for is native full-text search which, to me, is >>> essentially asking for an Erlang port of Lucene. We'd love this, but >>> it's a huge amount of work. Continually asking others to do >>> significant amounts of work is also wearying. >>> >>> To replace a Lucene-based solution and match its quality and breadth >>> is a huge chunk of work and is only necessary to satisfy people who, >>> for various reasons, don't want to use Java. >>> >>> To answer the original post, there are no publicly known plans to >>> build a native full-text indexing feature for CouchDB. >>> >>> B. >>> >>> On 28 March 2011 10:15, Olafur Arason <[email protected]> wrote: >>>> >>>> There does not seem to be much understanding that this could be a killer >>>> feature. People are now relying on Lucene which monitors the _changes >>>> feed. >>>> >>>> Cloudant has done it's own implementation which I gather through the >>>> information they have published makes a view out of all your word, >>>> they recommend java view because you can then reuse the lexer from >>>> Lucene. Then I think they are reusing the reader of the view to make >>>> their query. They have a similar syntax as Lucene for the query interface. >>>> They are still working on this and I think they don't have that much >>>> incentive to opensource it right away. But they have in past both >>>> opensourced there technology like BigCouch so I think it's more a >>>> matter of when rather then if. >>>> >>>> I think this is a good solution for a fulltext search. But I don't think >>>> that >>>> the java view does not have direct access to the data so it could be >>>> slow. But cloudant does clustering on view generation so that helps. >>>> >>>> But there is also general problem with the current view system where >>>> search technology could be used. >>>> >>>> The view are really good at sorting but people are using them to >>>> do key matches which they are not designed for. They beginkey and >>>> endkey are for sorting ranges and are not good for matching which >>>> most resources online are pointing to. >>>> >>>> For example when you do: >>>> beginkey = ["key11", "key21"] >>>> endkey = ["key19", "key21"] >>>> >>>> You get ["key11","key22"], ["key11", "key23"] ... ["key12","key21"], >>>> ["key12","key22"]... >>>> which makes sense when looking up sorting ranges but not using it to >>>> match keys. But you can have a range match lookup but only on the >>>> last key and never on two keys. So this would work: >>>> >>>> beginkey = ["key21", "key11"] >>>> endkey = ["key21", "key19"] >>>> >>>> The current view interface could be augmented to accept queries >>>> and could make them much more powerful then they currently are >>>> and just using the keys for sorting and selecting which values you >>>> want shown which they are designed to do and do really well. >>>> >>>> This would be a killer feature and could use the new infrastructure >>>> from Cloudant search. >>>> >>>> And don't tell me the Elastic or Lucene interface could do anything >>>> close to this :) >>>> >>>> Regards, >>>> Olafur Arason >>>> >>>> On Mon, Mar 28, 2011 at 04:31, Andrew Stuart (SuperCoders) >>>> <[email protected]> wrote: >>>>> >>>>> It would be good to know if full text search is coming as a core feature >>>>> and >>>>> if yes, approximately when - does anyone know? >>>>> >>>>> Even an approximate timeframe would be good. >>>>> >>>>> thanks >>>>> >>>> >>> -- >>> Message protected by MailGuard: e-mail anti-virus, anti-spam and content >>> filtering.http://www.mailguard.com.au/mg >>> Click here to report this message as spam: >>> https://login.mailguard.com.au/report/1BZveI1wri/4izG2DWUCf9OUvbAh9DkfT/0 >>> >> >
