[jira] [Commented] (SOLR-3085) Fix the dismax/edismax stopwords mm issue

2012-04-11 Thread Jonathan Rochkind (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251828#comment-13251828
 ] 

Jonathan Rochkind commented on SOLR-3085:
-

Hoss says: i have a nagging feeling that there are non-stopword cases that 
would be indistinguishable (to the parser) from this type of stopword case, and 
thus would also trigger this logic undesirably, but i can't articulate what 
they might be off the top of my head.

Indeed there are, pretty much anything where analysis differs between two 
fields in a way that can effect number of tokens produced. Punctuation 
stripping can sometimes do this, and I ran into such a case in my real world 
use.  More info 
http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/
 

This is a difficult problem to fix in the general case, at one point I think 
there was a solr listserv discussion where I tried to brainstorm general case 
solutions, but they were all shot down by people who knew more about Solr than 
me. :) I can't find the archive of that discussion now though. 

 Fix the dismax/edismax stopwords mm issue
 -

 Key: SOLR-3085
 URL: https://issues.apache.org/jira/browse/SOLR-3085
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Jan Høydahl
  Labels: MinimumShouldMatch, dismax, stopwords
 Fix For: 4.0


 As discussed here http://search-lucene.com/m/Wr7iz1a95jx and here 
 http://search-lucene.com/m/Yne042qEyCq1 and here 
 http://search-lucene.com/m/RfAp82nSsla DisMax has an issue with stopwords if 
 not all fields used in QF have exactly same stopword lists.
 Typical solution is to not use stopwords or harmonize stopword lists across 
 all fields in your QF, or relax the MM to a lower percentag. Sometimes these 
 are not acceptable workarounds, and we should find a better solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-12-21 Thread Jonathan Rochkind (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174179#comment-13174179
 ] 

Jonathan Rochkind commented on SOLR-2242:
-

I would find this feature valuable even if it simply did not work at all 
on a distributed index. (Refusing to return a value rather than 
returning a known incorrect value would seem like the right way to go).  
Because my index is not distributed, and I would find this feature 
valuable, heh.

I don't know if Solr currently has any policies against committing 
features that can't work on distributed, but personally my 'vote' would 
be doing that here, with clear documentation that it doesn't work on 
distributed (and the hope that future enhancements may make it more 
feasible to do so, as Erick suggests may possibly maybe happen).



 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.0

 Attachments: NumFacetTermsFacetsTest.java, 
 SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org