Re: Building IFI View for Text Queries

Nic Pottier Wed, 06 Jan 2010 11:10:29 -0800

On Wed, Jan 6, 2010 at 10:48 AM, Chris Anderson <[email protected]> wrote:
> The only catch is that you'll end up with a large index file in the
> long run. Lucene's indexes should be more compact on disk. Lucene also
> has more stemming options and will generally be smarter than your
> tokenizer.
>
> That said, if it works, it works.


Thanks Chris.  I do have a decent amount of experience with Lucene as
well, so I realize that is a great product, I just didn't want to add
another dependency, especially considering that CouchDB is still
changing quite a bit under the hood.

Any way to get an insight as to how big the index is?  I can see how
big my database is (78M with ~11k docs) but I'd be curious to know how
big that view is stored in memory.

One question I have is that it seems like it is rather inefficient to
store each word/id pair individually.  Would there be any value to
adding a reduce step that groups them so that the view would be
word->[id array] instead?  I will admit the reduce() step is one I am
still grabbling with a bit.

-Nic

Re: Building IFI View for Text Queries

Reply via email to