Hi Yonik

So I tested the join using the sample data below and the latest trunk. I still 
got the same behaviour.

HOWEVER! In this case it was nothing to do with the patch or solr version. It 
was the tokeniser splitting G1 into G and 1.

So thank you for a nice patch and your suggestions.

I do have a couple of questions for you: At what level does the join happen and 
what do you expect the performance penalty to be. We might use this extensively 
if the performance penalty isn't great.

Thanks again,

Matt

-----Original Message-----
From: Fowler, Matthew (Markets Eikon) 
Sent: 03 August 2011 15:04
To: yo...@lucidimagination.com
Cc: solr-user@lucene.apache.org
Subject: RE: Joining on multi valued fields

No I haven't. I will get the latest out of the trunk and report back.

Cheers again,

Matt

-----Original Message-----
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 03 August 2011 14:51
To: Fowler, Matthew (Markets Eikon)
Cc: solr-user@lucene.apache.org
Subject: Re: Joining on multi valued fields

Hmmm, if these are real responses from a solr server "at rest" (i.e.
documents not being changed between queries) then what you show
definitely looks like a bug.
That's interesting, since TestJoin implements a random test that
should cover cases like this pretty well.

I assume you are using a version of trunk (4.0-dev) and not just the
actual attached to the JIRA issue (which IIRC had at least one bug...
SOLR-2521).
Have you tried a more recent version of trunk?

-Yonik
http://www.lucidimagination.com



On Wed, Aug 3, 2011 at 7:00 AM,  <matthew.fow...@thomsonreuters.com> wrote:
> Hi Yonik
>
> Sorry for my late reply. I have been trying to get to the bottom of this
> but I'm getting inconsistent behaviour. Here's an example:
>
> Query = "pi:rcs100"     -       Here going to use "pid_rcs" as join
> value
>
> <result name="response" numFound="1" start="0">
>  <doc>
>  <str name="pi">rcs100</str>
>  <str name="ct">rcs</str>
>  <str name="pid_rcs">G1</str>
>  <str name="name_rcs">Emerging Market Countries</str>
>  <str name="definition_rcs">All business events relating to companies
> and other issuers of securities.</str>
>  </doc>
>  </result>
>  </response>
>
> Query = "code:G1"       -       See how many docs have "G1" in their
> code field. Notice that "code" is multi valued
>
> - <result name="response" numFound="2" start="0">
> - <doc>
>  <str name="ct">cat</str>
>  <date name="maindocdate">2011-04-22T05:48:57Z</date>
>  <str name="pi">nCIF3wGpXk+1029782</str>
> - <arr name="code">
>  <str>G1</str>
>  <str>G7U</str>
>  <str>GK</str>
>  <str>ME7</str>
>  <str>ME8</str>
>  <str>MN</str>
>  <str>MR</str>
>  </arr>
>  </doc>
> - <doc>
>  <str name="ct">cat</str>
>  <date name="maindocdate">2011-04-22T05:48:57Z</date>
>  <str name="pi">nCIF7YcLP+1029782</str>
> - <arr name="code">
>  <str>G1</str>
>  <str>G7U</str>
>  <str>GK</str>
>  <str>ME7</str>
>  <str>ME8</str>
>  <str>MN</str>
>  <str>MR</str>
>  </arr>
>  </doc>
>  </result>
>  </response>
>
> Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join
> from=pid_rcs to=code}pi:rcs100
>
> - <result name="response" numFound="3" start="0">
> - <doc>
>  <str name="ct">cat</str>
>  <date name="maindocdate">2011-04-22T05:48:57Z</date>
>  <str name="pi">nCIF3wGpXk+1029782</str>
> - <arr name="code">
>  <str>G1</str>
>  <str>G7U</str>
>  <str>GK</str>
>  <str>ME7</str>
>  <str>ME8</str>
>  <str>MN</str>
>  <str>MR</str>
>  </arr>
>  </doc>
> - <doc>
>  <str name="ct">cat</str>
>  <date name="maindocdate">2011-04-22T05:48:57Z</date>
>  <str name="pi">nCIF7YcLP+1029782</str>
> - <arr name="code">
>  <str>G1</str>
>  <str>G7U</str>
>  <str>GK</str>
>  <str>ME7</str>
>  <str>ME8</str>
>  <str>MN</str>
>  <str>MR</str>
>  </arr>
>  </doc>
> - <doc>
>  <str name="ct">cat</str>
>  <date name="maindocdate">2011-04-22T05:48:58Z</date>
>  <str name="pi">nCN1763203+1029782</str>
> - <arr name="code">
>  <str>A2</str>
>  <str>A5</str>
>  <str>A9</str>
>  <str>AN</str>
>  <str>B125</str>
>  <str>B126</str>
>  <str>B130</str>
>  <str>BL63</str>
>  <str>G41</str>
>  <str>GK</str>
>  <str>MZ</str>
>  </arr>
>  </doc>
>  </result>
>  </response>
>
> So as you can see I get back 3 results when only 2 match the criteria.
> i.e. docs where G1 is present in multi valued "code" field. Why should
> the last document be included in the result of the join?
>
> Thank you,
>
> Matt
>
>
> -----Original Message-----
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: 01 August 2011 18:28
> To: solr-user@lucene.apache.org
> Subject: Re: Joining on multi valued fields
>
> On Mon, Aug 1, 2011 at 12:58 PM,  <matthew.fow...@thomsonreuters.com>
> wrote:
>> I have been using the JOIN patch
>> https://issues.apache.org/jira/browse/SOLR-2272 with great success.
>>
>> However I have hit a case where it doesn't seem to be working. It
>> doesn't seem to work when joining to a multi-valued field.
>
> That should work (and the unit tests do test with multi-valued fields).
> Can you come up with a simple example where you are not getting the
> expected results?
>
> -Yonik
> http://www.lucidimagination.com
>
> This email was sent to you by Thomson Reuters, the global news and 
> information company. Any views expressed in this message are those of the 
> individual sender, except where the sender specifically states them to be the 
> views of Thomson Reuters.
>

This email was sent to you by Thomson Reuters, the global news and information 
company. Any views expressed in this message are those of the individual 
sender, except where the sender specifically states them to be the views of 
Thomson Reuters.

This email was sent to you by Thomson Reuters, the global news and information 
company. Any views expressed in this message are those of the individual 
sender, except where the sender specifically states them to be the views of 
Thomson Reuters.

Reply via email to