Re: Confused by queries

2013-01-24 Thread Anders Melchiorsen

Hello.

That is indeed an excellent article, thanks for pointing me at it. With
a title like that, it is no wonder that I was unable to google it on my
own.

It is probably the exception in this rule that has been confusing me:

If a BooleanQuery contains no MUST BooleanClauses, then a
document is only considered a match against the BooleanQuery
if one or more of the SHOULD BooleanClauses is a match.

So +group:id +keyword:text and (+group:id) +keyword:text mean
completely different things.

I have mostly been using the reference at
http://lucene.apache.org/core/3_6_0/queryparsersyntax.html and it does
not mention this distinction. Quite the contrary, actually, as it says
that grouping can be used to eliminate confusion, thereby suggesting 
that

the usual rules of Boolean algebra apply.


Thanks again,
Anders.


On 23.01.2013 02:20, Erick Erickson wrote:

Solr/Lucene does not implement strict boolean logic. Here's an
excellent blog discussing this:

http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/

Best
Erick

On Tue, Jan 22, 2013 at 7:25 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:

Well, depends on what you indexed.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 22, 2013 5:48 PM, Anders Melchiorsen 
m...@spoon.kalibalik.dk

wrote:


Thanks, though I am still confused.

How about this one:

manu:apple = 1 hit
+name:video = 2 hits

manu:apple +name:video = 2 hits

Solr ignores the manu:apple part completely?


Cheers,
Anders.


Den 22/01/13 23.16, Jack Krupansky skrev:


The first query:

   name:ipod OR -name:ipod = 0 hits

The OR and - are actually at the same level of the 
BooleanQuery, so

the - overrides the OR so it's equivalent to:

   name:ipod -name:ipod = 0 hits

For the second query:

   (name:ipod) OR (-name:ipod) = 3 hits

Pure negative queries are supported only at the top level, so the
(-name:ipod) matches nothing, so the query is equivalent to:

   (name:ipod) = 3 hits

You can simply insert a *:* to assure that it is not a pure 
negative

query inside the parentheses:

   (name:ipod) OR (*:* -name:ipod)

-- Jack Krupansky

-Original Message- From: Anders Melchiorsen
Sent: Tuesday, January 22, 2013 4:59 PM
To: solr-user@lucene.apache.org
Subject: Confused by queries

Hello!

With the example server of Solr 4.0.0 (with *.xml indexed), I get 
these

results:

*:* = 32 hits
name:ipod = 3 hits
-name:ipod = 29 hits

That is all fine, but for these next queries, I would expect to 
get 32
hits (i.e. everything), or at least the same number of hits for 
both

queries:

name:ipod OR -name:ipod = 0 hits
(name:ipod) OR (-name:ipod) = 3 hits

As my expectations are not met, I must be missing something?


Thanks,
Anders.








Confused by queries

2013-01-22 Thread Anders Melchiorsen

Hello!

With the example server of Solr 4.0.0 (with *.xml indexed), I get these 
results:


*:* = 32 hits
name:ipod = 3 hits
-name:ipod = 29 hits

That is all fine, but for these next queries, I would expect to get 32 
hits (i.e. everything), or at least the same number of hits for both 
queries:


name:ipod OR -name:ipod = 0 hits
(name:ipod) OR (-name:ipod) = 3 hits

As my expectations are not met, I must be missing something?


Thanks,
Anders.



Re: Confused by queries

2013-01-22 Thread Anders Melchiorsen

Thanks, though I am still confused.

How about this one:

manu:apple = 1 hit
+name:video = 2 hits

manu:apple +name:video = 2 hits

Solr ignores the manu:apple part completely?


Cheers,
Anders.


Den 22/01/13 23.16, Jack Krupansky skrev:

The first query:

   name:ipod OR -name:ipod = 0 hits

The OR and - are actually at the same level of the BooleanQuery, 
so the - overrides the OR so it's equivalent to:


   name:ipod -name:ipod = 0 hits

For the second query:

   (name:ipod) OR (-name:ipod) = 3 hits

Pure negative queries are supported only at the top level, so the 
(-name:ipod) matches nothing, so the query is equivalent to:


   (name:ipod) = 3 hits

You can simply insert a *:* to assure that it is not a pure negative 
query inside the parentheses:


   (name:ipod) OR (*:* -name:ipod)

-- Jack Krupansky

-Original Message- From: Anders Melchiorsen
Sent: Tuesday, January 22, 2013 4:59 PM
To: solr-user@lucene.apache.org
Subject: Confused by queries

Hello!

With the example server of Solr 4.0.0 (with *.xml indexed), I get these
results:

*:* = 32 hits
name:ipod = 3 hits
-name:ipod = 29 hits

That is all fine, but for these next queries, I would expect to get 32
hits (i.e. everything), or at least the same number of hits for both
queries:

name:ipod OR -name:ipod = 0 hits
(name:ipod) OR (-name:ipod) = 3 hits

As my expectations are not met, I must be missing something?


Thanks,
Anders.




Which schema changes are incompatible?

2010-01-28 Thread Anders Melchiorsen
Hello.

I read the FAQ entry about rebuilding the index,

 
http://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F

but it is not clear about the times when this is needed. So I wonder, do I
need to do it after adding a field, removing a field, changing field type,
changing indexed/stored/multiValue properties? What happens if I don't do
it, will Solr die?

Also, the FAQ entry notes that one can delete all documents, change the
schema.xml file, and then reload the core. Would it be possible to instead
change schema.xml, reload the core, and then rebuild the index -- in effect
slowly deleting the old documents, but never ending up with a completely
empty index? I realize that some weird search results could happen during
such a rebuild, but that may be preferable to having no search results at
all.

(I also realize that we need more Solr servers, to be able to do these
updates without taking down the search service. But, currently we have just
one)


Thanks,
Anders.



Overlapping zipcodes

2009-09-21 Thread Anders Melchiorsen
We are in a situation where we are trying to match up documents based on a
number of zipcodes.

In our case, zipcodes are just integers, so that hopefully simplifies
things.


So, we might have a document listing a number of zipcodes:

  1200-1450,2000,5000-5999

and we want to do a search of 1100-1300,8000 and have it match the
document.


How can this be done using Solr?


Thanks,
Anders.



Re: Overlapping zipcodes

2009-09-21 Thread Anders Melchiorsen
Yeah, that takes care of the query side, but how can we index a list like
that?

It seems wrong to create a multivalue zipcode field and populate it with
each individual number in all the ranges.


Regards,
Anders.


On Mon, 21 Sep 2009 19:05:01 +0530, Avlesh Singh avl...@gmail.com wrote:
 Range queries?

 Cheers
 Avlesh

 On Mon, Sep 21, 2009 at 2:57 PM, Anders Melchiorsen
 m...@spoon.kalibalik.dk
 wrote:

 We are in a situation where we are trying to match up documents based
on
 a
 number of zipcodes.

 In our case, zipcodes are just integers, so that hopefully simplifies
 things.


 So, we might have a document listing a number of zipcodes:

  1200-1450,2000,5000-5999

 and we want to do a search of 1100-1300,8000 and have it match the
 document.


 How can this be done using Solr?


 Thanks,
 Anders.




Re: HTML decoder is splitting tokens

2009-08-27 Thread Anders Melchiorsen
Hello.

Thanks for the hints. Still some trouble, though.

I added just the HTMLStripCharFilterFactory because, according to
documentation, it should also replace HTML entities. It did, but still
left a space after the entity, so I got two tokens from Guuml;nther.
That seems like a bug?

Adding MappingCharFilterFactory in front of the HTML stripper (so that the
latter will not see the entity) does work as expected. That is, until I
try strings like use lt;pgt; to mark a paragraph, where the HTML
stripper will then remove parts of the actual text. So this approach will
not work.


Finally, I was happy that I could now use an arbitrary tokenizer with HTML
input. The PatternTokenizer, however, seems to be using character offsets
corresponding to the output of the char filters, and so the highlighting
markers end up at the wrong place. Is that a bug, or a configuration
issue?


Cheers,
Anders.


Koji Sekiguchi wrote:
 Hi Anders,

 Sorry, I don't know this is a bug or a feature, but
 I'd like to show an alternate way if you'd like.

 In Solr trunk, HTMLStripWhitespaceTokenizerFactory is
 marked as deprecated. Instead, HTMLStripCharFilterFactory and
 an arbitrary TokenizerFactory are encouraged to use.
 And I'd recommend you to use MappingCharFilterFactory
 to convert character references to real characters.
 That is, you have:

 fieldType name=textHtml class=solr.TextField 
   analyzer
 charFilter class=solr.MappingCharFilterFactory
 mapping=mapping.txt/
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType

 where the contents of mapping.txt:

 uuml; = ü
 auml; = ä
 iuml; = ï
 euml; = ë
 ouml; = ö
 : :

 Then run analysis.jsp and see the result.

 Thank you,

 Koji


 Anders Melchiorsen wrote:
 Hi.

 When indexing the string Guuml;nther with
 HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens,
 Gü and nther.

 Is this a bug, or am I doing something wrong?

 (Using a Solr nightly from 2009-05-29)


 Anders.










HTML decoder is splitting tokens

2009-08-26 Thread Anders Melchiorsen
Hi.

When indexing the string Guuml;nther with
HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens,
Gü and nther.

Is this a bug, or am I doing something wrong?

(Using a Solr nightly from 2009-05-29)


Anders.




Re: Highlight arbitrary text

2009-07-23 Thread Anders Melchiorsen
On Tue, 21 Jul 2009 14:25:52 +0200, Anders Melchiorsen wrote:

 On Fri, 17 Jul 2009 16:04:24 +0200, Anders Melchiorsen wrote:

 However, in the normal highlighter, I am using usePhraseHighlighter and
 highlightMultiTerm and it seems that there is no way to turn these on in
 FieldAnalysisRequestHandler ?

 In case these options are not available with the
 FieldAnalysisRequestHandler,
 would it be simple to implement them with a plugin? The
highlightMultiTerm
 is absolutely needed, as we use a lot of prefix searches.

I tried following the FieldAnalysisRequestHandler code, but I could not
find
a place to plug in wildcard searching. Is it supposed to be simple (like
enabling a single option somewhere), or will it need a bunch of new code?



In related news, the highlighter is not exactly working correctly, because
I use the PatternTokenizer for the indexed fields, and
HTMLStripWhiteSpaceTokenizer
obviously gives slightly different results on the presentation field.

So, I tried creating my own plugin:

public class HTMLStripPatternTokenizerFactory extends
PatternTokenizerFactory {
public TokenStream create(Reader input) {
return super.create(new org.apache.solr.analysis.HTMLStripReader(input));
}
}

It seems to work, but is that the proper way to mix the HTML stripper and
the Pattern tokenizer? Obviously, I would prefer not having to maintain a
plugin,
even if it is a tiny one.


- Anders



Re: Highlight arbitrary text

2009-07-21 Thread Anders Melchiorsen
On Fri, 17 Jul 2009 16:04:24 +0200, Anders Melchiorsen
m...@cup.kalibalik.dk wrote:
 On Thu, 16 Jul 2009 10:56:38 -0400, Erik Hatcher
 e...@ehatchersolutions.com wrote:

 One trick worth noting is the FieldAnalysisRequestHandler can provide
 offsets from external text, which could be used for client-side
 highlighting (see the showmatch parameter too).

 Thanks. I tried doing this, and it almost works.

 However, in the normal highlighter, I am using usePhraseHighlighter and
 highlightMultiTerm and it seems that there is no way to turn these on in
 FieldAnalysisRequestHandler ?

In case these options are not available with the
FieldAnalysisRequestHandler,
would it be simple to implement them with a plugin? The highlightMultiTerm
is absolutely needed, as we use a lot of prefix searches.


Thanks,
Anders.



Re: Highlight arbitrary text

2009-07-17 Thread Anders Melchiorsen
On Thu, 16 Jul 2009 10:56:38 -0400, Erik Hatcher
e...@ehatchersolutions.com wrote:

 One trick worth noting is the FieldAnalysisRequestHandler can provide
 offsets from external text, which could be used for client-side
 highlighting (see the showmatch parameter too).

Thanks. I tried doing this, and it almost works.

However, in the normal highlighter, I am using usePhraseHighlighter and
highlightMultiTerm and it seems that there is no way to turn these on in
FieldAnalysisRequestHandler ?


Anders.




Highlight arbitrary text

2009-07-15 Thread Anders Melchiorsen
Is it possible to have Solr highlight an arbitrary text that is posted at
request time?

Currently, we are storing an unindexed HTML field in Solr, just to have it
highlighted. We would prefer to generate the HTML from the database at
presentation time, in order to keep the Solr index smaller and faster.


Thanks,
Anders.