Re: solr equivalent of "select distinct"

Erick Erickson Sun, 11 Sep 2011 13:19:09 -0700

Hmmm, there's no good way I can think of off the top of my
head to do this. Whenever people find themselves thinking
in terms of RDBMSs, I have to ask whether the problem is
really appropriate for a search engine. And/or what the problem
you're trying to solve with this approach is from a higher level.
Perhaps there's another approach completely that would
serve....


Best
Erick

On Sun, Sep 11, 2011 at 12:39 PM, Mark juszczec <mark.juszc...@gmail.com> wrote:
> Erick
>
> Thanks very much for the reply.
>
> I typed this late Friday after work and tried to simplify the problem
> description.  I got something wrong.  Hopefully this restatement is better:
>
> My PK is FLD1, FLD2 and FLD3 concatenated together.
>
> In some cases FLD1 and FLD2 can be the same.  The ONLY differing field being
> FLD3.
>
> Here's an example:
>
> PK   FLD1      FLD2    FLD3 FLD4 FLD5
> AB0  A            B          0     x       y
> AB1  A            B          1     x       y
> CD0  C            D          0     a       b
> CD1  C            D          1     e       f
>
> I want to write a query using only the terms FLD1 and FLD2 and ONLY get
> back:
>
> A B x y
> C D a b
> C D e f
>
> Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one
> occurrence of those records.
>
> Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH
> occurrences of those records.
>
> I'm hoping I can use wildcards to get FLD4 and FLD5.  If not, I can use fl=
>
> I'm using edismax.
>
> We are also creating the query string on the fly.  I suspect using SolrJ and
> plugging the values into a bean would be easier - or do I have that wrong?
>
> I hope the tables of example data display properly.
>
> Mark
>
> On Sun, Sep 11, 2011 at 12:06 PM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> This smells like an XY problem, can you back up and give a higher-level
>> reason *why* you want this behavior?
>>
>> Because given your problem description, this seems like you are getting
>> correct behavior no matter how you define the problem. You're essentially
>> saying that you have two records with identical beginnings of your PK,
>> why is it incorrect to give you both records?
>>
>> But, anyway, if you're searching on FLD1 and FLD2, then by definition
>> you're going to get both records back or the search would be failing!
>>
>> Best
>> Erick
>>
>> On Fri, Sep 9, 2011 at 8:08 PM, Mark juszczec <mark.juszc...@gmail.com>
>> wrote:
>> > Hello everyone
>> >
>> > Let's say each record in my index contains fields named PK, FLD1, FLD2,
>> FLD3
>> > .... FLD100
>> >
>> > PK is my solr primary key and I'm creating it by concatenating
>> > FLD1+FLD2+FLD3 and I'm guaranteed that combination will be unique
>> >
>> > Let's say 2 of these records have FLD1 = A and FLD2 = B.  I am unsure
>> about
>> > the remaining fields
>> >
>> > Right now, if I do a query specifying FLD1 = A and FLD2 = B then I get
>> both
>> > records.  I only want 1.
>> >
>> > Research says I should use faceting.  But this:
>> >
>> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 &
>> > facet=true & facet_field=FLD1 & facet_field=FLD2
>> >
>> > gives me 2 records.
>> >
>> > In fact, it gives me the same results as:
>> >
>> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2
>> >
>> > I'm wrong somewhere, but I'm unsure where.
>> >
>> > Is faceting the right way to go or should I be using grouping?
>> >
>> > Curiously, when I use grouping like this:
>> >
>> > q=FLD1:A and FLD2:B &rows=500 &defType=edismax &indent=true &fl=FLD1,
>> FLD2
>> > &group=true &group.field=FLD1 &group.field=FLD2
>> >
>> > I get 2 records as well.
>> >
>> > Has anyone dealt with mimicing "select distinct" in Solr?
>> >
>> > Any advice would be very appreciated.
>> >
>> > Mark
>> >
>>
>

Re: solr equivalent of "select distinct"

Reply via email to