Re: Confused by queries
Hello. That is indeed an excellent article, thanks for pointing me at it. With a title like that, it is no wonder that I was unable to google it on my own. It is probably the exception in this rule that has been confusing me: If a BooleanQuery contains no MUST BooleanClauses, then a document is only considered a match against the BooleanQuery if one or more of the SHOULD BooleanClauses is a match. So +group:id +keyword:text and (+group:id) +keyword:text mean completely different things. I have mostly been using the reference at http://lucene.apache.org/core/3_6_0/queryparsersyntax.html and it does not mention this distinction. Quite the contrary, actually, as it says that grouping can be used to eliminate confusion, thereby suggesting that the usual rules of Boolean algebra apply. Thanks again, Anders. On 23.01.2013 02:20, Erick Erickson wrote: Solr/Lucene does not implement strict boolean logic. Here's an excellent blog discussing this: http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/ Best Erick On Tue, Jan 22, 2013 at 7:25 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Well, depends on what you indexed. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 22, 2013 5:48 PM, Anders Melchiorsen m...@spoon.kalibalik.dk wrote: Thanks, though I am still confused. How about this one: manu:apple = 1 hit +name:video = 2 hits manu:apple +name:video = 2 hits Solr ignores the manu:apple part completely? Cheers, Anders. Den 22/01/13 23.16, Jack Krupansky skrev: The first query: name:ipod OR -name:ipod = 0 hits The OR and - are actually at the same level of the BooleanQuery, so the - overrides the OR so it's equivalent to: name:ipod -name:ipod = 0 hits For the second query: (name:ipod) OR (-name:ipod) = 3 hits Pure negative queries are supported only at the top level, so the (-name:ipod) matches nothing, so the query is equivalent to: (name:ipod) = 3 hits You can simply insert a *:* to assure that it is not a pure negative query inside the parentheses: (name:ipod) OR (*:* -name:ipod) -- Jack Krupansky -Original Message- From: Anders Melchiorsen Sent: Tuesday, January 22, 2013 4:59 PM To: solr-user@lucene.apache.org Subject: Confused by queries Hello! With the example server of Solr 4.0.0 (with *.xml indexed), I get these results: *:* = 32 hits name:ipod = 3 hits -name:ipod = 29 hits That is all fine, but for these next queries, I would expect to get 32 hits (i.e. everything), or at least the same number of hits for both queries: name:ipod OR -name:ipod = 0 hits (name:ipod) OR (-name:ipod) = 3 hits As my expectations are not met, I must be missing something? Thanks, Anders.
Confused by queries
Hello! With the example server of Solr 4.0.0 (with *.xml indexed), I get these results: *:* = 32 hits name:ipod = 3 hits -name:ipod = 29 hits That is all fine, but for these next queries, I would expect to get 32 hits (i.e. everything), or at least the same number of hits for both queries: name:ipod OR -name:ipod = 0 hits (name:ipod) OR (-name:ipod) = 3 hits As my expectations are not met, I must be missing something? Thanks, Anders.
Re: Confused by queries
Thanks, though I am still confused. How about this one: manu:apple = 1 hit +name:video = 2 hits manu:apple +name:video = 2 hits Solr ignores the manu:apple part completely? Cheers, Anders. Den 22/01/13 23.16, Jack Krupansky skrev: The first query: name:ipod OR -name:ipod = 0 hits The OR and - are actually at the same level of the BooleanQuery, so the - overrides the OR so it's equivalent to: name:ipod -name:ipod = 0 hits For the second query: (name:ipod) OR (-name:ipod) = 3 hits Pure negative queries are supported only at the top level, so the (-name:ipod) matches nothing, so the query is equivalent to: (name:ipod) = 3 hits You can simply insert a *:* to assure that it is not a pure negative query inside the parentheses: (name:ipod) OR (*:* -name:ipod) -- Jack Krupansky -Original Message- From: Anders Melchiorsen Sent: Tuesday, January 22, 2013 4:59 PM To: solr-user@lucene.apache.org Subject: Confused by queries Hello! With the example server of Solr 4.0.0 (with *.xml indexed), I get these results: *:* = 32 hits name:ipod = 3 hits -name:ipod = 29 hits That is all fine, but for these next queries, I would expect to get 32 hits (i.e. everything), or at least the same number of hits for both queries: name:ipod OR -name:ipod = 0 hits (name:ipod) OR (-name:ipod) = 3 hits As my expectations are not met, I must be missing something? Thanks, Anders.
Which schema changes are incompatible?
Hello. I read the FAQ entry about rebuilding the index, http://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F but it is not clear about the times when this is needed. So I wonder, do I need to do it after adding a field, removing a field, changing field type, changing indexed/stored/multiValue properties? What happens if I don't do it, will Solr die? Also, the FAQ entry notes that one can delete all documents, change the schema.xml file, and then reload the core. Would it be possible to instead change schema.xml, reload the core, and then rebuild the index -- in effect slowly deleting the old documents, but never ending up with a completely empty index? I realize that some weird search results could happen during such a rebuild, but that may be preferable to having no search results at all. (I also realize that we need more Solr servers, to be able to do these updates without taking down the search service. But, currently we have just one) Thanks, Anders.
Overlapping zipcodes
We are in a situation where we are trying to match up documents based on a number of zipcodes. In our case, zipcodes are just integers, so that hopefully simplifies things. So, we might have a document listing a number of zipcodes: 1200-1450,2000,5000-5999 and we want to do a search of 1100-1300,8000 and have it match the document. How can this be done using Solr? Thanks, Anders.
Re: Overlapping zipcodes
Yeah, that takes care of the query side, but how can we index a list like that? It seems wrong to create a multivalue zipcode field and populate it with each individual number in all the ranges. Regards, Anders. On Mon, 21 Sep 2009 19:05:01 +0530, Avlesh Singh avl...@gmail.com wrote: Range queries? Cheers Avlesh On Mon, Sep 21, 2009 at 2:57 PM, Anders Melchiorsen m...@spoon.kalibalik.dk wrote: We are in a situation where we are trying to match up documents based on a number of zipcodes. In our case, zipcodes are just integers, so that hopefully simplifies things. So, we might have a document listing a number of zipcodes: 1200-1450,2000,5000-5999 and we want to do a search of 1100-1300,8000 and have it match the document. How can this be done using Solr? Thanks, Anders.
Re: HTML decoder is splitting tokens
Hello. Thanks for the hints. Still some trouble, though. I added just the HTMLStripCharFilterFactory because, according to documentation, it should also replace HTML entities. It did, but still left a space after the entity, so I got two tokens from Guuml;nther. That seems like a bug? Adding MappingCharFilterFactory in front of the HTML stripper (so that the latter will not see the entity) does work as expected. That is, until I try strings like use lt;pgt; to mark a paragraph, where the HTML stripper will then remove parts of the actual text. So this approach will not work. Finally, I was happy that I could now use an arbitrary tokenizer with HTML input. The PatternTokenizer, however, seems to be using character offsets corresponding to the output of the char filters, and so the highlighting markers end up at the wrong place. Is that a bug, or a configuration issue? Cheers, Anders. Koji Sekiguchi wrote: Hi Anders, Sorry, I don't know this is a bug or a feature, but I'd like to show an alternate way if you'd like. In Solr trunk, HTMLStripWhitespaceTokenizerFactory is marked as deprecated. Instead, HTMLStripCharFilterFactory and an arbitrary TokenizerFactory are encouraged to use. And I'd recommend you to use MappingCharFilterFactory to convert character references to real characters. That is, you have: fieldType name=textHtml class=solr.TextField analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType where the contents of mapping.txt: uuml; = ü auml; = ä iuml; = ï euml; = ë ouml; = ö : : Then run analysis.jsp and see the result. Thank you, Koji Anders Melchiorsen wrote: Hi. When indexing the string Guuml;nther with HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens, Gü and nther. Is this a bug, or am I doing something wrong? (Using a Solr nightly from 2009-05-29) Anders.
HTML decoder is splitting tokens
Hi. When indexing the string Guuml;nther with HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens, Gü and nther. Is this a bug, or am I doing something wrong? (Using a Solr nightly from 2009-05-29) Anders.
Re: Highlight arbitrary text
On Tue, 21 Jul 2009 14:25:52 +0200, Anders Melchiorsen wrote: On Fri, 17 Jul 2009 16:04:24 +0200, Anders Melchiorsen wrote: However, in the normal highlighter, I am using usePhraseHighlighter and highlightMultiTerm and it seems that there is no way to turn these on in FieldAnalysisRequestHandler ? In case these options are not available with the FieldAnalysisRequestHandler, would it be simple to implement them with a plugin? The highlightMultiTerm is absolutely needed, as we use a lot of prefix searches. I tried following the FieldAnalysisRequestHandler code, but I could not find a place to plug in wildcard searching. Is it supposed to be simple (like enabling a single option somewhere), or will it need a bunch of new code? In related news, the highlighter is not exactly working correctly, because I use the PatternTokenizer for the indexed fields, and HTMLStripWhiteSpaceTokenizer obviously gives slightly different results on the presentation field. So, I tried creating my own plugin: public class HTMLStripPatternTokenizerFactory extends PatternTokenizerFactory { public TokenStream create(Reader input) { return super.create(new org.apache.solr.analysis.HTMLStripReader(input)); } } It seems to work, but is that the proper way to mix the HTML stripper and the Pattern tokenizer? Obviously, I would prefer not having to maintain a plugin, even if it is a tiny one. - Anders
Re: Highlight arbitrary text
On Fri, 17 Jul 2009 16:04:24 +0200, Anders Melchiorsen m...@cup.kalibalik.dk wrote: On Thu, 16 Jul 2009 10:56:38 -0400, Erik Hatcher e...@ehatchersolutions.com wrote: One trick worth noting is the FieldAnalysisRequestHandler can provide offsets from external text, which could be used for client-side highlighting (see the showmatch parameter too). Thanks. I tried doing this, and it almost works. However, in the normal highlighter, I am using usePhraseHighlighter and highlightMultiTerm and it seems that there is no way to turn these on in FieldAnalysisRequestHandler ? In case these options are not available with the FieldAnalysisRequestHandler, would it be simple to implement them with a plugin? The highlightMultiTerm is absolutely needed, as we use a lot of prefix searches. Thanks, Anders.
Re: Highlight arbitrary text
On Thu, 16 Jul 2009 10:56:38 -0400, Erik Hatcher e...@ehatchersolutions.com wrote: One trick worth noting is the FieldAnalysisRequestHandler can provide offsets from external text, which could be used for client-side highlighting (see the showmatch parameter too). Thanks. I tried doing this, and it almost works. However, in the normal highlighter, I am using usePhraseHighlighter and highlightMultiTerm and it seems that there is no way to turn these on in FieldAnalysisRequestHandler ? Anders.
Highlight arbitrary text
Is it possible to have Solr highlight an arbitrary text that is posted at request time? Currently, we are storing an unindexed HTML field in Solr, just to have it highlighted. We would prefer to generate the HTML from the database at presentation time, in order to keep the Solr index smaller and faster. Thanks, Anders.