RE: Using multiple analysers within a query

2004-11-24 Thread Kauler, Leto S
Hi again,

Thanks for everyone who replied.  The PerFieldAnalyzerWrapper was a good
suggestion, and one I had overlooked, but for our particular
requirements it wouldn't quite work so I went with overriding
getFieldQuery().

You were right, Paul.  In 1.4.2 a whole heap of QueryParser changes were
made, mostly removing the analyzer parameter from methods.

In the end I built my changes on top of the NewMultiFieldQueryParser
which was shared here recently and works wonders -- thanks Bill Janssen
and sergiu gordea.  I added support for slops and boosts to build
together with the multi-fields array, and then overrode getFieldQuery to
check the queryText for a start char (= for example) and if found
remove it and switch to a non-tokenising analyser.

Then I found that because that analyser always returns a single token
(TermQuery) it would send through spaces into the final query string,
causing problems.  So also in getFieldQuery I check if it needs breaking
up and converting into a PhraseQuery.

Seems to work, just needs thorough testing.  If anyone would like a copy
I could post it up here.

Regards, --Leto
(excuse the disclaimer...)



 We have the need for analysed and 'not analysed/not tokenised' clauses

 within one query.  Imagine an unparsed query like:
 
 +title:Hello World +path:Resources\Live\1
 
 In the above example we would want the first clause to use 
 StandardAnalyser and the second to use an analyser which returns the 
 term as a single token.  So a parsed result might look like:
 
 +(title:hello title:world) +path:Resources\Live\1
 
 Would anyone have any suggestions on how this could be done?  I was 
 thinking maybe the QueryParser would have to be changed/extended to 
 accept a separator other than colon :, something like = for 
 example to indicate this clause is not to be tokenised.  Or perhaps 
 this can all be done using a single analyser?

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Using multiple analysers within a query

2004-11-24 Thread Kauler, Leto S
Actually, just realised a PhraseQuery is incorrect...
I only want a single TermQuery but it just needs to be quoted, d'oh.


-Original Message-
Then I found that because that analyser always returns a single token
(TermQuery) it would send through spaces into the final query string,
causing problems.  So also in getFieldQuery I check if it needs breaking
up and converting into a PhraseQuery.

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using multiple analysers within a query

2004-11-22 Thread Paul Elschot
On Monday 22 November 2004 05:02, Kauler, Leto S wrote:
 Hi Lucene list,
 
 We have the need for analysed and 'not analysed/not tokenised' clauses
 within one query.  Imagine an unparsed query like:
 
 +title:Hello World +path:Resources\Live\1
 
 In the above example we would want the first clause to use
 StandardAnalyser and the second to use an analyser which returns the
 term as a single token.  So a parsed result might look like:
 
 +(title:hello title:world) +path:Resources\Live\1
 
 Would anyone have any suggestions on how this could be done?  I was
 thinking maybe the QueryParser would have to be changed/extended to
 accept a separator other than colon :, something like = for example
 to indicate this clause is not to be tokenised.  Or perhaps this can all
 be done using a single analyser?

Overriding QueryParser.getFieldQuery() might work for you.
It is given the field and the query text so an analyzer can be chosen
depending on the field.
In case you don't use the latest cvs head, it may be worthwhile to
have a look. Some of the getFieldQuery methods have been
deprecated, but I don't know when.

Regards,
Paul.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using multiple analysers within a query

2004-11-22 Thread Morus Walter
Kauler, Leto S writes:
 
 Would anyone have any suggestions on how this could be done?  I was
 thinking maybe the QueryParser would have to be changed/extended to
 accept a separator other than colon :, something like = for example
 to indicate this clause is not to be tokenised.  

I suggested that in a recent discussion and Erik Hatcher objected that
it isn't a good idea, to require that users know which field to query
in which way. I guess he is right.
If your query isn't entered by users, you shouldn't use query parser in
most cases anyway.

 Or perhaps this can all
 be done using a single analyser?
 
Look at PerFieldAnalyzerWrapper. 
You will probably have to write a keyword analyzer (unless you can use
whitespace analyzer in your case).

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using multiple analysers within a query

2004-11-22 Thread Erik Hatcher
On Nov 22, 2004, at 2:56 AM, Morus Walter wrote:
Kauler, Leto S writes:
Would anyone have any suggestions on how this could be done?  I was
thinking maybe the QueryParser would have to be changed/extended to
accept a separator other than colon :, something like = for 
example
to indicate this clause is not to be tokenised.
I suggested that in a recent discussion and Erik Hatcher objected that
it isn't a good idea, to require that users know which field to query
in which way. I guess he is right.
QueryParser is a one-size fits (?) all sort of beast.  It has plenty of 
negatives, no question.

If your query isn't entered by users, you shouldn't use query parser in
most cases anyway.
I'd go even further and say in all cases.
Or perhaps this can all
be done using a single analyser?
Look at PerFieldAnalyzerWrapper.
You will probably have to write a keyword analyzer (unless you can use
whitespace analyzer in your case).
We should probably add a KeywordAnalyzer to Lucene's core at some point.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Using multiple analysers within a query

2004-11-22 Thread Morus Walter
Erik Hatcher writes:

  If your query isn't entered by users, you shouldn't use query parser in
  most cases anyway.
 
 I'd go even further and say in all cases.
 
If you use lucene as a search server you have to provide the query somehow.
E.g. we have an php application, that sends queries to a lucene search
servlet.
In this case it's justifiable to serialize the query into query parser
syntax on the client side and have query parser read the query again on
the server side.
I don't recall any problems with the aproach since we clean up the user
before constructing the query.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using multiple analysers within a query

2004-11-22 Thread Erik Hatcher
On Nov 22, 2004, at 9:17 AM, Morus Walter wrote:
Erik Hatcher writes:
If your query isn't entered by users, you shouldn't use query parser 
in
most cases anyway.
I'd go even further and say in all cases.
If you use lucene as a search server you have to provide the query 
somehow.
E.g. we have an php application, that sends queries to a lucene search
servlet.
In this case it's justifiable to serialize the query into query parser
syntax on the client side and have query parser read the query again on
the server side.
Ah, good point!  I hadn't considered this scenario.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Using multiple analysers within a query

2004-11-21 Thread Kauler, Leto S
Hi Lucene list,

We have the need for analysed and 'not analysed/not tokenised' clauses
within one query.  Imagine an unparsed query like:

+title:Hello World +path:Resources\Live\1

In the above example we would want the first clause to use
StandardAnalyser and the second to use an analyser which returns the
term as a single token.  So a parsed result might look like:

+(title:hello title:world) +path:Resources\Live\1

Would anyone have any suggestions on how this could be done?  I was
thinking maybe the QueryParser would have to be changed/extended to
accept a separator other than colon :, something like = for example
to indicate this clause is not to be tokenised.  Or perhaps this can all
be done using a single analyser?

Regards (and excuse the disclaimer),
--Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]