Re: Field Collapsing Performance

2010-09-29 Thread Kaktu Chakarabati
Hey Li, Thanks - great answer, exactly touched on the points I was interested in. One last Q - Once you did tweak it to work in a 'top K' way,what was performance impact like? I've written similar components in the past that iterate over top result set docs (on the order of 400-600 top results)

Re: Field Collapsing Performance

2010-09-29 Thread Yonik Seeley
On Tue, Sep 28, 2010 at 8:14 PM, Li Li fancye...@gmail.com wrote: I think current implmetation is slow. because it do collapse in all the hit docs. In my view, it will take more than 1s when using collapse and only 200ms-300ms when not in our environment. So we modify it as -- when user need

Field Collapsing Performance

2010-09-28 Thread Kaktu Chakarabati
hey guys, Any word on this? has anyone did any benchmarking / used this in production-like environment? We are considering using this feature on a large scale for deduplication and was wondering if anyone has some numbers before I go ahead and start my own series of tests... thanks, Chak

Re: Field Collapsing Performance

2010-09-28 Thread Li Li
I think current implmetation is slow. because it do collapse in all the hit docs. In my view, it will take more than 1s when using collapse and only 200ms-300ms when not in our environment. So we modify it as -- when user need top 100 docs, we collect top 200 docs and do collapse within these 200