Re: QueryParser and compound words

2003-03-13 Thread Tatu Saloranta
On Thursday 13 March 2003 00:52, Magnus Johansson wrote:
> Tatu Saloranta wrote:
...
> >But same happens during indexing; fotbollsmatch should be properly
> >split and stemmed to "fotboll" and "match" terms, right?
>
> Yes but the word fotbollsmatch was never indexed in this example. Only
> the word fotboll.
> I want a query for fotbollsmatch to match a document containing the word
> fotboll.

Ok I think I finally understand what you meant. :-)

So, basically, in your case you would prefer getting query:

fotbollsmatch

to expand to (after stemming etc):

fotboll match

and not

"fotboll match"

So that matching just one of the words would be enough for a hit (either
"either of" or "just first word" or "just last word").
It would be possible to implement this functionality by overriding default
QueryParser and modifying its functionality slightly. 

In QueryParser you should be able to override default handling for terms,
so that whenever you get just single token (in this case "fotbollsmatch")
that expands to multiple Terms, you do not construct a phrase query, but
just BooleanQuery with TermQueries (look at getFieldQuery(); it handles
basic search terms). You may need to use simple heuristics for figuring
when you have white space(s) that indicate "normal" phrases, which probably
should still be handled using PhraseQuery.

Of course this is all assuming you still do want that functionality. :-)
And if you do, it would be good idea to get patch back in case someone else
finds that useful later on (I think many non-english languages have concept
of compound words; German and Finnish at least do).

-+ Tatu +-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Range of Score Values?

2003-03-13 Thread Rishabh Bajpai
 

Hi, 

I am getting a long value between 1(included) and 0(excluded-I think), and it makes 
sense to me logically as well - I wouldnt know what a value of greater than 1 would 
mean, and why should a term that has a score of 0 be returned in the first place! But 
just to be sure, I wanted to check the range of values one can get for the Score? 

Also from a user experience perspective, how would one represent this score on the 
page rendered. The value itself makes little sense to the enduser - so if I try to 
convert it to a precentage (of what?), or star, etc... which is a better option?! any 
suggestions, pointers...

-rb



_
Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year.
http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Searching for hyphenated terms

2003-03-13 Thread Sieretzki, Dionne R, SOLGV
Thanks for the tips.  I'll give your code a try today!

Dionne

-Original Message-
From: Rob Outar [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 13, 2003 11:32 AM
To: Lucene Users List
Subject: RE: Searching for hyphenated terms


I had similar problems that were solved with this Analyzer:

 
public TokenStream tokenStream(String field, final Reader reader) {

// do not tokenize any field
TokenStream t = new CharTokenizer(reader) {
protected boolean isTokenChar(char c) {
return true;
}
};

//case insensitive search
t = new LowerCaseFilter(t);
return t;

}

Thanks,
 
Rob 


-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 13, 2003 11:22 AM
To: Lucene Users List
Subject: Re: Searching for hyphenated terms


Make a custom Analyzer.  They are super simple to write.
Take pieces of WhitespaceAnalyzer and the Standard one.

Otis

--- "Sieretzki, Dionne R, SOLGV" <[EMAIL PROTECTED]> wrote:
> I have seen some previous postings about "Escape woes" and "Hyphens
> not matching", but I haven't seen any resolutions to an issue I've
> been trying to work out.  
> 
> I don't want my search field to be case sensitive, so I used
> StandardAnalyzer.  The search field also has corresponding entries
> that may or may not contain hyphens or other special characters. If
> the field is not tokenized, very few search terms result in matches. 
> It appears that terms are only matched if a wildcard is used, such
> as:
> 
> Entered: ADOG  / Actual Query is: adog / No match on an exact term
> Entered: ADOG* / Actual Query is: ADOG* /  Match found
> Entered: AAA-ADOG / Actual Query is: aaa -adog / No match 
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / No match 
> Entered: AAA?ADOG /  Actual Query is: aaa?adog / Match found
> Entered: DOG.2  / Actual Query is: dog.2 / No match 
> Entered: DOG?2 / Actual Query is: DOG?2 /  Match found
> 
> 
> If the field is tokenized, then even more mixed results are produced.
> 
> Entered: ADOG / Actual Query is: adog / Match found for exact term
> Entered: ADOG* / Acutal Query is: ADOG* / No match
> Entered: AAA-ADOG / Actual Query is: aaa -adog / Match found
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / Match found
> Entered: DOG.2 / Actual Query is: adog.2  / Match found
> Entered: AAA-DOG-BBB / Actual Query is: aaa -dog -bbb / No match
> Entered: " AAA-DOG-BBB" / Actual Query is: "aaa dog bbb" / No match
> Entered: ADOG-I40 / Actual Query is: adog -i40 / Incorrect matches
> Entered: "ADOG-I40" / Actual Query is: adog-i40 / Match found for
> exact term
> 
> 
> Can anyone recommend the right Analyzer to use that isn't case
> sensitive and matches on both hyphenated and non-hyphenated terms?
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Searching for hyphenated terms

2003-03-13 Thread Rob Outar
I had similar problems that were solved with this Analyzer:

 
public TokenStream tokenStream(String field, final Reader reader) {

// do not tokenize any field
TokenStream t = new CharTokenizer(reader) {
protected boolean isTokenChar(char c) {
return true;
}
};

//case insensitive search
t = new LowerCaseFilter(t);
return t;

}

Thanks,
 
Rob 


-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 13, 2003 11:22 AM
To: Lucene Users List
Subject: Re: Searching for hyphenated terms


Make a custom Analyzer.  They are super simple to write.
Take pieces of WhitespaceAnalyzer and the Standard one.

Otis

--- "Sieretzki, Dionne R, SOLGV" <[EMAIL PROTECTED]> wrote:
> I have seen some previous postings about "Escape woes" and "Hyphens
> not matching", but I haven't seen any resolutions to an issue I've
> been trying to work out.  
> 
> I don't want my search field to be case sensitive, so I used
> StandardAnalyzer.  The search field also has corresponding entries
> that may or may not contain hyphens or other special characters. If
> the field is not tokenized, very few search terms result in matches. 
> It appears that terms are only matched if a wildcard is used, such
> as:
> 
> Entered: ADOG  / Actual Query is: adog / No match on an exact term
> Entered: ADOG* / Actual Query is: ADOG* /  Match found
> Entered: AAA-ADOG / Actual Query is: aaa -adog / No match 
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / No match 
> Entered: AAA?ADOG /  Actual Query is: aaa?adog / Match found
> Entered: DOG.2  / Actual Query is: dog.2 / No match 
> Entered: DOG?2 / Actual Query is: DOG?2 /  Match found
> 
> 
> If the field is tokenized, then even more mixed results are produced.
> 
> Entered: ADOG / Actual Query is: adog / Match found for exact term
> Entered: ADOG* / Acutal Query is: ADOG* / No match
> Entered: AAA-ADOG / Actual Query is: aaa -adog / Match found
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / Match found
> Entered: DOG.2 / Actual Query is: adog.2  / Match found
> Entered: AAA-DOG-BBB / Actual Query is: aaa -dog -bbb / No match
> Entered: " AAA-DOG-BBB" / Actual Query is: "aaa dog bbb" / No match
> Entered: ADOG-I40 / Actual Query is: adog -i40 / Incorrect matches
> Entered: "ADOG-I40" / Actual Query is: adog-i40 / Match found for
> exact term
> 
> 
> Can anyone recommend the right Analyzer to use that isn't case
> sensitive and matches on both hyphenated and non-hyphenated terms?
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for hyphenated terms

2003-03-13 Thread Otis Gospodnetic
Make a custom Analyzer.  They are super simple to write.
Take pieces of WhitespaceAnalyzer and the Standard one.

Otis

--- "Sieretzki, Dionne R, SOLGV" <[EMAIL PROTECTED]> wrote:
> I have seen some previous postings about "Escape woes" and "Hyphens
> not matching", but I haven't seen any resolutions to an issue I've
> been trying to work out.  
> 
> I don't want my search field to be case sensitive, so I used
> StandardAnalyzer.  The search field also has corresponding entries
> that may or may not contain hyphens or other special characters. If
> the field is not tokenized, very few search terms result in matches. 
> It appears that terms are only matched if a wildcard is used, such
> as:
> 
> Entered: ADOG  / Actual Query is: adog / No match on an exact term
> Entered: ADOG* / Actual Query is: ADOG* /  Match found
> Entered: AAA-ADOG / Actual Query is: aaa -adog / No match 
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / No match 
> Entered: AAA?ADOG /  Actual Query is: aaa?adog / Match found
> Entered: DOG.2  / Actual Query is: dog.2 / No match 
> Entered: DOG?2 / Actual Query is: DOG?2 /  Match found
> 
> 
> If the field is tokenized, then even more mixed results are produced.
> 
> Entered: ADOG / Actual Query is: adog / Match found for exact term
> Entered: ADOG* / Acutal Query is: ADOG* / No match
> Entered: AAA-ADOG / Actual Query is: aaa -adog / Match found
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / Match found
> Entered: DOG.2 / Actual Query is: adog.2  / Match found
> Entered: AAA-DOG-BBB / Actual Query is: aaa -dog -bbb / No match
> Entered: " AAA-DOG-BBB" / Actual Query is: "aaa dog bbb" / No match
> Entered: ADOG-I40 / Actual Query is: adog -i40 / Incorrect matches
> Entered: "ADOG-I40" / Actual Query is: adog-i40 / Match found for
> exact term
> 
> 
> Can anyone recommend the right Analyzer to use that isn't case
> sensitive and matches on both hyphenated and non-hyphenated terms?
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching for hyphenated terms

2003-03-13 Thread Sieretzki, Dionne R, SOLGV
I have seen some previous postings about "Escape woes" and "Hyphens not matching", but 
I haven't seen any resolutions to an issue I've been trying to work out.  

I don't want my search field to be case sensitive, so I used StandardAnalyzer.  The 
search field also has corresponding entries that may or may not contain hyphens or 
other special characters. If the field is not tokenized, very few search terms result 
in matches.  It appears that terms are only matched if a wildcard is used, such as:

Entered: ADOG  / Actual Query is: adog / No match on an exact term
Entered: ADOG* / Actual Query is: ADOG* /  Match found
Entered: AAA-ADOG / Actual Query is: aaa -adog / No match 
Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / No match 
Entered: AAA?ADOG /  Actual Query is: aaa?adog / Match found
Entered: DOG.2  / Actual Query is: dog.2 / No match 
Entered: DOG?2 / Actual Query is: DOG?2 /  Match found


If the field is tokenized, then even more mixed results are produced.

Entered: ADOG / Actual Query is: adog / Match found for exact term
Entered: ADOG* / Acutal Query is: ADOG* / No match
Entered: AAA-ADOG / Actual Query is: aaa -adog / Match found
Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / Match found
Entered: DOG.2 / Actual Query is: adog.2  / Match found
Entered: AAA-DOG-BBB / Actual Query is: aaa -dog -bbb / No match
Entered: " AAA-DOG-BBB" / Actual Query is: "aaa dog bbb" / No match
Entered: ADOG-I40 / Actual Query is: adog -i40 / Incorrect matches
Entered: "ADOG-I40" / Actual Query is: adog-i40 / Match found for exact term


Can anyone recommend the right Analyzer to use that isn't case sensitive and matches 
on both hyphenated and non-hyphenated terms?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]