Re: Keyword search with space and wildcard
Unfortunately for me, I have done a good bit of yacc in the past. Don't really wanna have to look in the class unless is absolutely necessary! :-) Well, like Eric mentioned, if you can just ignore JavaCC syntax and look at embedded Java code, it's not all that complicated (especially you have done something with yacc/bison, to know how parser generators generally work)... but granted at first looks bit alien. :-) I did a couple of basic tests using the WildcardQuery like below and it seemed to work. I have not tried an example using the PrefixQuery and I doubt that I will. Now, I haven't tested this, but I would think that just building single Query that searches wild card phrase "some th*" in field "my_field", you'd just do: Query q = new WildcardQuery(new Term("my_field", "some th*")); and feed that to whichever search object you need. In this particular case you could also use PrefixQuery instead; if so, you need strip out the trailing "*" (since that's implied when constructing PrefixQuery). Because of the nature of my application I'm guaranteed to have at least one search term but the majority of the time it will be more then one. So in my particular case, I'll almost always get a BooleanQuery back from QueryParser. I'm currently working on an implementation of Eric's suggestion. Eric described the method of combining queries that should work as far as I could see. If you do not want to rely on QueryParer to return BooleanQuery, you can also just build your your BooleanQuery, and wrap sub-queries as BooleanClauses. That's bit more work but should work as well. Thanks to everyone for the help. It is much appreciated. -Brian _ MSN 8: Get 6 months for $9.95/month. http://join.msn.com/?page=dept/dialup - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Keyword search with space and wildcard
On Tuesday 02 September 2003 09:04, Brian Campbell wrote: > Great. Is there an example anywhere on how I might be able to build such a > Query? QueryParser isn't really all that simple since it's built with > JavaCC. Well, like Eric mentioned, if you can just ignore JavaCC syntax and look at embedded Java code, it's not all that complicated (especially you have done something with yacc/bison, to know how parser generators generally work)... but granted at first looks bit alien. :-) Now, I haven't tested this, but I would think that just building single Query that searches wild card phrase "some th*" in field "my_field", you'd just do: Query q = new WildcardQuery(new Term("my_field", "some th*")); and feed that to whichever search object you need. In this particular case you could also use PrefixQuery instead; if so, you need strip out the trailing "*" (since that's implied when constructing PrefixQuery). > What might be ideal for me is if I can continue to use the highlevel > interface to build the main query (ie use it to parse my query string and > return me some kind of Query - BooleanQuery, TermQuery, etc) and then build > a WildcardQuery by hand and "combine" the two together? For example, is it > as simple as calling Query.combine() to combine the two? Is there a better > way? Is there a documented example like this? Thanks! Eric described the method of combining queries that should work as far as I could see. If you do not want to rely on QueryParer to return BooleanQuery, you can also just build your your BooleanQuery, and wrap sub-queries as BooleanClauses. That's bit more work but should work as well. Good luck and let us know if that works, -+ Tatu +- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Keyword search with space and wildcard
Not sure about documented examples, but I often find the unit tests (in src/test of lucene's CVS) to be very useful for examples but I didn't see any for what you are looking for. Basically, query parser builds up a vector of BooleanClause objects then loops over those on a BooleanQuery object calling add(BooleanClause). I agree JavaCC isn't really simple to follow, but there is a lot of plain java in there that does the parts you are interested in and if you build the .java file and ignore the token parsing stuff, you can look at in your favorite java IDE. What you can do is cast the query you get from QueryParser to a BooleanQuery (that is the only type of Query that QueryParser will return) then create your WildcardQuery or any other queries you need that you didn't get in the query string and add them as clauses to the BooleanQuery using add(Query query, boolean required, boolean prohibited). I don't know how query combine works (never used it), but the javadoc comment leads me to believe it is not what you are looking for and a bit of poking around in the sources gives me the same impression. Eric -Original Message- From: Brian Campbell [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 02, 2003 11:05 AM To: [EMAIL PROTECTED] Subject: Re: Keyword search with space and wildcard Great. Is there an example anywhere on how I might be able to build such a Query? QueryParser isn't really all that simple since it's built with JavaCC. What might be ideal for me is if I can continue to use the highlevel interface to build the main query (ie use it to parse my query string and return me some kind of Query - BooleanQuery, TermQuery, etc) and then build a WildcardQuery by hand and "combine" the two together? For example, is it as simple as calling Query.combine() to combine the two? Is there a better way? Is there a documented example like this? Thanks! -Brian > >This can be done, AFAIK. > >This is one thing that many people seem unaware of: you don't HAVE to >use QueryParser to build queries. In your case it seems like you should >be able to construct query you want if you either by-pass QueryParser, >or create a dummy analyzer (one that does no tokenization but returns >all input as one token). > _ Enter for your chance to IM with Bon Jovi, Seal, Bow Wow, or Mary J Blige using MSN Messenger http://entertainment.msn.com/imastar - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Keyword search with space and wildcard
Great. Is there an example anywhere on how I might be able to build such a Query? QueryParser isn't really all that simple since it's built with JavaCC. What might be ideal for me is if I can continue to use the highlevel interface to build the main query (ie use it to parse my query string and return me some kind of Query - BooleanQuery, TermQuery, etc) and then build a WildcardQuery by hand and "combine" the two together? For example, is it as simple as calling Query.combine() to combine the two? Is there a better way? Is there a documented example like this? Thanks! -Brian This can be done, AFAIK. This is one thing that many people seem unaware of: you don't HAVE to use QueryParser to build queries. In your case it seems like you should be able to construct query you want if you either by-pass QueryParser, or create a dummy analyzer (one that does no tokenization but returns all input as one token). _ Enter for your chance to IM with Bon Jovi, Seal, Bow Wow, or Mary J Blige using MSN Messenger http://entertainment.msn.com/imastar - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Keyword search with space and wildcard
On Friday 29 August 2003 10:02, Terry Steichen wrote: > I agree. One problem, however, that new (and not-so-new) Lucene users face > is a learning curve when they want to get past the simplest and most > obvious uses of Lucene. For example, I don't think any of the docs mention > the fact that you can't combine a phrase and a wildcard query. Other > things that are obviously quite well understood by many members of the > list, are still less-than-clear to others. For example, I found (and still > find) it a bit difficult to find concrete examples/advice of how to get > good benefit from filters. > > My whole point is that this is a *very* powerful and flexible technology. > But I think it's often very difficult for those most experienced in using > Lucene to fully appreciate how it looks from the "newbie" point of view. I agree completely. Perhaps I worded my reply badly; I didn't mean to sound hostile towards new users at all -- after all I consider myself to be one (I just happened to work on simple improvements to QueryParser and learnt how it works). I wish documentation was more complete; perhaps some section could list common workarounds or insights. And perhaps incompatibility of phrase and wild card queries could be added to document that lists current limitations. I guess the reason I think it's valuable to document the flexibility of query construction is that I have been working on something similar (although working with database queries) in a system I'm working on, and I have also seen systems that have query syntax that's too intertwined with backend implementation (for example, while Hibernate is a good ORM, its queries don't seem to have backend independent intermediate representation... which makes it hard to develop different kinds of backends). So, it's useful to know that there are 2 levels of interfaces to Lucene's query functionality. -+ Tatu +- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Keyword search with space and wildcard
Tatu, I agree. One problem, however, that new (and not-so-new) Lucene users face is a learning curve when they want to get past the simplest and most obvious uses of Lucene. For example, I don't think any of the docs mention the fact that you can't combine a phrase and a wildcard query. Other things that are obviously quite well understood by many members of the list, are still less-than-clear to others. For example, I found (and still find) it a bit difficult to find concrete examples/advice of how to get good benefit from filters. My whole point is that this is a *very* powerful and flexible technology. But I think it's often very difficult for those most experienced in using Lucene to fully appreciate how it looks from the "newbie" point of view. Just my $0.02. Regards, Terry - Original Message - From: "Tatu Saloranta" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, August 29, 2003 11:14 AM Subject: Re: Keyword search with space and wildcard > On Thursday 28 August 2003 21:54, Brian Campbell wrote: > > Basically, yes, I am trying to put a wildcard in a phrase. My field (a > > Keyword) is the name of a project. It can be 40 characters long (I'm > > basically indexing some database columns). Since it is a Keyword and not a > > Text field, it doesn't get tokenized (I do this on purpose) and must match > > up exactly. I would like for users to be able to search on partial phrases > > such as "Hello w*" and match up to "Hello world" and "Hello washington", > > etc. Is this not possible? Is it documented anywhere? > > This can be done, AFAIK. > > This is one thing that many people seem unaware of: you don't HAVE to use > QueryParser to build queries. In your case it seems like you should be able > to construct query you want if you either by-pass QueryParser, or create > a dummy analyzer (one that does no tokenization but returns all input as > one token). > > Since QueryParser is fairly simple class, you should be able to see how wild > card queries are constructed. You can not (and need not) create a phrase > query since it does not allow wild cards (like someone pointed out), but > since the whole phrase is just one token for keyword fields, you can use > normal wild card query (or prefix for cases like "Hello w*"). > > It would be nice if FAQ could point out that QueryParser is higher-level > interface to query part, but it is possible and sometimes necessary to do > your own query construction. I think it's very cool Lucene queries were > properly modularized this way -- too many open source projects have > components too tightly coupled. > > -+ Tatu +- > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Keyword search with space and wildcard
On Thursday 28 August 2003 21:54, Brian Campbell wrote: > Basically, yes, I am trying to put a wildcard in a phrase. My field (a > Keyword) is the name of a project. It can be 40 characters long (I'm > basically indexing some database columns). Since it is a Keyword and not a > Text field, it doesn't get tokenized (I do this on purpose) and must match > up exactly. I would like for users to be able to search on partial phrases > such as "Hello w*" and match up to "Hello world" and "Hello washington", > etc. Is this not possible? Is it documented anywhere? This can be done, AFAIK. This is one thing that many people seem unaware of: you don't HAVE to use QueryParser to build queries. In your case it seems like you should be able to construct query you want if you either by-pass QueryParser, or create a dummy analyzer (one that does no tokenization but returns all input as one token). Since QueryParser is fairly simple class, you should be able to see how wild card queries are constructed. You can not (and need not) create a phrase query since it does not allow wild cards (like someone pointed out), but since the whole phrase is just one token for keyword fields, you can use normal wild card query (or prefix for cases like "Hello w*"). It would be nice if FAQ could point out that QueryParser is higher-level interface to query part, but it is possible and sometimes necessary to do your own query construction. I think it's very cool Lucene queries were properly modularized this way -- too many open source projects have components too tightly coupled. -+ Tatu +- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Keyword search with space and wildcard
Basically, yes, I am trying to put a wildcard in a phrase. My field (a Keyword) is the name of a project. It can be 40 characters long (I'm basically indexing some database columns). Since it is a Keyword and not a Text field, it doesn't get tokenized (I do this on purpose) and must match up exactly. I would like for users to be able to search on partial phrases such as "Hello w*" and match up to "Hello world" and "Hello washington", etc. Is this not possible? Is it documented anywhere? Thanks. -Brian From: "Terry Steichen" <[EMAIL PROTECTED]> Reply-To: "Lucene Users List" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Subject: Re: Keyword search with space and wildcard Date: Thu, 28 Aug 2003 22:29:44 -0400 If I understand your issue correctly, I think what you're experiencing is the fact that you can have a phrase query "hello world", or a wildcard query +hell* +wor*, but you can't mix the two together. As far as I've found, that's a basic limitation you just have to live with. (Of course, if someone on the list can show me where I'm wrong, I'll be delighted.) You can add boosting to any kind of term (such as wor*^10 or "world order"^10), but (I don't think) you can't mix wildcards and phrases. HTH, Terry - Original Message ----- From: "Brian Campbell" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, August 28, 2003 4:45 PM Subject: Keyword search with space and wildcard > I've created and index that has a Keyword field in it. I'm trying to do a > search on that field where my term has a space and the wildcard character in > it. For example, I'll issue the following search: project_name:"Hello w*". > I have an entry in the project_name field of "Hello world". I would > expect to get a hit on this but I don't. Is this not the way Lucene > behaves? Am I doing something wrong? Thanks. > > -Brian > > _ > Help protect your PC: Get a free online virus scan at McAfee.com. > http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Get MSN 8 and help protect your children with advanced parental controls. http://join.msn.com/?page=features/parental - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Keyword search with space and wildcard
If I understand your issue correctly, I think what you're experiencing is the fact that you can have a phrase query "hello world", or a wildcard query +hell* +wor*, but you can't mix the two together. As far as I've found, that's a basic limitation you just have to live with. (Of course, if someone on the list can show me where I'm wrong, I'll be delighted.) You can add boosting to any kind of term (such as wor*^10 or "world order"^10), but (I don't think) you can't mix wildcards and phrases. HTH, Terry - Original Message - From: "Brian Campbell" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, August 28, 2003 4:45 PM Subject: Keyword search with space and wildcard > I've created and index that has a Keyword field in it. I'm trying to do a > search on that field where my term has a space and the wildcard character in > it. For example, I'll issue the following search: project_name:"Hello w*". > I have an entry in the project_name field of "Hello world". I would > expect to get a hit on this but I don't. Is this not the way Lucene > behaves? Am I doing something wrong? Thanks. > > -Brian > > _ > Help protect your PC: Get a free online virus scan at McAfee.com. > http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Keyword search with space and wildcard
Brian, This seems akin to the Phrase Searching problem that I encountered (haven't heard anything back from my posting yet) - which goes as follows: I try to do the phrase search "center* form" but the system seems to simply ignore the wildcard (throws it away) when processing the search - so I get only results for "center form". My guess is the parser is processing your search simply as if it were "hello w". I've got no solution - was hoping to hear something from the list. Joe - Original Message - From: "Brian Campbell" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, August 28, 2003 4:45 PM Subject: Keyword search with space and wildcard > I've created and index that has a Keyword field in it. I'm trying to do a > search on that field where my term has a space and the wildcard character in > it. For example, I'll issue the following search: project_name:"Hello w*". > I have an entry in the project_name field of "Hello world". I would > expect to get a hit on this but I don't. Is this not the way Lucene > behaves? Am I doing something wrong? Thanks. > > -Brian > > _ > Help protect your PC: Get a free online virus scan at McAfee.com. > http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Keyword search with space and wildcard
I've created and index that has a Keyword field in it. I'm trying to do a search on that field where my term has a space and the wildcard character in it. For example, I'll issue the following search: project_name:"Hello w*". I have an entry in the project_name field of "Hello world". I would expect to get a hit on this but I don't. Is this not the way Lucene behaves? Am I doing something wrong? Thanks. -Brian _ Help protect your PC: Get a free online virus scan at McAfee.com. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]