subject:"Dedupe of document results at query\-time"

Re: Dedupe of document results at query-time

2010-01-23 Thread Martijn v Groningen

This manner of detecting duplicates at query time does really match
with what field collapsing does. So I suggest you look into that. As
far as I know there isn't any function query that does something you
have described in your example.

Cheers,

Martijn

On 23 January 2010 12:31, Peter S  wrote:
>
> Hi,
>
>
>
> I wonder if someone might be able to shed some insight into this problem:
>
>
>
> Is it possible and/or what is the best/accepted way to achieve deduplication 
> of documents by field at query-time?
>
>
>
> For example:
>
> Let's say an index contains:
>
>
>
> Doc1
>
> 
>
> host:Host1
>
> time:1 Sept 09
>
> appname:activePDF
>
>
>
> Doc2
>
> 
>
> host:Host1
>
> time:2 Sept 09
>
> appname:activePDF
>
>
>
> Doc3
>
> 
>
> host:Host1
>
> time:3 Sept 09
>
> appname:activePDF
>
>
>
> Can a query be constructed that would return only 1 of these Documents based 
> on appname (doesn't really matter which one)?
>
>
>
> i.e.:
>
>   match on host:Host1
>
>   ignore time
>
>   dedupe on appname:activePDF
>
>
>
> Is this possible? Would FunctionQuery be helpful here, maybe? Am I actually 
> talking about field collapsing?
>
>
>
> Many thanks,
>
> Peter
>
>
>
> _
> Got a cool Hotmail story? Tell us now
> http://clk.atdmt.com/UKM/go/195013117/direct/01/



-- 
Met vriendelijke groet,

Martijn van Groningen

Dedupe of document results at query-time

2010-01-23 Thread Peter S


Hi,

 

I wonder if someone might be able to shed some insight into this problem:

 

Is it possible and/or what is the best/accepted way to achieve deduplication of 
documents by field at query-time?

 

For example:

Let's say an index contains:

 

Doc1



host:Host1

time:1 Sept 09

appname:activePDF

 

Doc2



host:Host1

time:2 Sept 09

appname:activePDF

 

Doc3



host:Host1

time:3 Sept 09

appname:activePDF

 

Can a query be constructed that would return only 1 of these Documents based on 
appname (doesn't really matter which one)?

 

i.e.:

   match on host:Host1

   ignore time

   dedupe on appname:activePDF

 

Is this possible? Would FunctionQuery be helpful here, maybe? Am I actually 
talking about field collapsing?

 

Many thanks,

Peter

 
  
_
Got a cool Hotmail story? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Re: Dedupe of document results at query-time

Dedupe of document results at query-time

2 matches

Site Navigation

Mail list logo

Footer information