Thanks Doug, removing "query" definitely helped. I just switched to Ivan's new patch (which definitely helped a lot - no SEVERE errors now - thanks Ivan!) but I'm still struggling with faceting myself.

Basically, I can tell that faceting is happening after the collapse - because the facet counts are definitely lower than they would be otherwise. For example, with one search, I'd have 196 results with no collapsing, I get 120 results with collapsing - but the facet count is 119??? In other searches the difference is more drastic - In another search, I get 61 results without collapsing, 61 with collapsing, but the facet count is 39.

Looking at it for a while now, I think I can guess what the problem might be...

The incorrect counts seem to only happen when the term in question does not occur evenly across all duplicates of a document. That is, multiple document records may exist for the same image (it's an image search engine), but each document will have different terms in different fields depending on the audience it's targeting. So, when you collapse, the counts are lower than they should be because when you actually execute a search with that facet's term included in the query, *all* the documents after collapsing will be ones that have that term.

Here's an illustration:

Collapse field is "link_id", facet field is "keyword":


Doc 1:
id: 123456,
link_id: 2,
keyword: Black, Printed, Dress

Doc 2:
id: 123457,
link_id: 2,
keyword: Black, Shoes, Patent

Doc 3:
id: 123458,
link_id: 2,
keyword: Red, Hat, Felt

Doc 4:
id: 123459,
link_id:1,
keyword: Felt, Hat, Black

So, when you collapse, only two of these documents are in the result set (123456, 123459), and only the keywords Black, Printed, Dress, Felt, and Hat are counted. The facet count for Black is 2, the facet count for Felt is 1. If you choose Black and add it to your query, you get 2 results (great). However, if you add *Felt* to your query, you get 2 results (because a different document for link_id 2 is chosen in that query than is in the more general query from which the facets are produced).

I think what needs to happen here is that all the terms for all the documents that are collapsed together need to be included (just once) with the document that gets counted for faceting. In this example, when the document for link_id 2 is counted, it would need to appear to the facet counter to have keywords Black, Printed, Dress, Shoes, Patent, Red, Hat, and Felt, as opposed to just Black, Printed, and Dress.

Unfortunately, not knowing Java at all really, I have absolutely no idea how this change would be implemented... I mean I can tweak here or there, but I think this is above my pay grade. I've looked at the code affected by the patch and the code for faceting but I can't make heads or tails of it.

I think I'll go post this over on the JIRA...

Any ideas?

--
Steve



On Dec 10, 2008, at 8:52 AM, Doug Steigerwald wrote:

The first output is from the query component. You might just need to make the collapse component first and remove the query component completely.

We perform geographic searching with localsolr first (if we need to), and then try to collapse those results (if collapse=true). If we don't have any results yet, that's the only time we use the standard query component. I'm making sure we set the builder.setNeedDocSet=false and then I modified the query component to only execute when builder.isNeedDocSet=true.

In the field collapsing patch that I'm using, I've got code to remove a previous 'response' from the builder.rsp so we don't have duplicates.

Now, if I could get field collapsing to work properly with a docSet/ docList from localsolr and also have faceting work, I'd be golden.

Doug

On Dec 9, 2008, at 9:37 PM, Stephen Weiss wrote:

Hi Tracy,

Well, I managed to get it working (I think) but the weird thing is, in the XML output it gives both recordsets (the filtered and unfiltered - filtered second). In the JSON (the one I actually use anyway, at least) I only get the filtered results (as expected).

In my core's solrconfig.xml, I added:

<searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent" />

(I'm not sure if it's supposed to go anywhere in particular but for me it's right before StandardRequestHandler)

and then within StandardRequestHandler:

<requestHandler name="standard" class="solr.StandardRequestHandler">
  <!-- default values for query parameters -->
   <lst name="defaults">
     <str name="echoParams">explicit</str>
     <!--
     <int name="rows">10</int>
     <str name="fl">*</str>
     <str name="version">2.1</str>
      -->
   </lst>
   <arr name="components">
      <str>query</str>
      <str>facet</str>
      <str>mlt</str>
      <str>highlight</str>
      <str>debug</str>
      <str>collapse</str>
   </arr>
</requestHandler>


Which is basically all the default values plus collapse. Not sure if this was needed for prior versions, I don't see it in any patch files (I just got a vague idea from looking at a comment from someone else who said it wasn't working for them). It would kinda be nice if someone working on the code might throw us a bone and say explicitly what the right options to put in the config file are (if there are even supposed to be any - for all I know, this is just a bandaid over a larger problem). I know it's not done yet though... just a pointer for this patch might be handy, it's really a useful feature if it works (I was kinda shocked this wasn't part of the standard distribution since it's something I had to do so often with mysql, kinda lucky I guess that it only came up now).

Another issue I'm having now is the faceting doesn't seem to change - even if I set the collapse.facet option to "after"... I should really try "before" and see what happens.

Of course, I just realized the integrity of my collapse field is not so great so I have to go back and redo the data :-)

Best of luck.

--
Steve

On Dec 9, 2008, at 7:49 PM, Tracy Flynn (SOLR) wrote:

Steve,

I need this too. As my previous posting said, I adapted the 1.2 field collapsing back at the beginning of the year, so I'm somewhat familiar.

I'll try and get a look this weekend. It's the earliest I''m likely to get spare cycles. I'll post any results.

Tracy

On Dec 9, 2008, at 4:18 PM, Stephen Weiss wrote:

Hi,

I'm trying to use field collapsing with our SOLR but I just can't seem to get it to do anything.

I've downloaded a dist copy of solr 1.3 and applied Ivan de Prado's patch - reading through the source code, the patch definitely was applied successfully (all the changes are in the right places, I've checked every single one).

I've run ant clean, ant compile, and ant dist to produce the war file in the dist/ folder, and then put the war file in place and restarted jetty. According to the logs, jetty is definitely loading the right war file. If I expand the war file and grep through the files, it would appear the collapsing code is there.

However, when I add any sort of collapse parameters (I've tried any combination of collapse=true collapse.field=link_id collapse.threshold=1 collapse.type=normal collapse.info.doc=true), the result set is no different from normal query, and there is no collapse data returned in the XML.

I'm not a java developer, this is my first time using ant period, and I'm just following basic directions I found on google.


Here is the output of the compilation process:



I really need this patch to work for a project... Can someone please tell me what I'm missing to get this to work? I can't really find any documentation beyond adding the collapse options to the query string, so it's hard to tell - is there an option in solrconfig.xml or in the core configuration that needs to be set? Am I going about this entirely the wrong way?

Thanks for any advice, I appreciate it.

[ sorry if you get this twice, I accidentally sent first from the wrong e-mail address and I don't think it went through ]

--
Steve



Reply via email to