Re: Open-ended range queries
On Jun 10, 2004, at 10:37 PM, Terry Steichen wrote: Speaking for myself, only a small number of my code modules currently treat null as the open-ended range query term parameter. If the syntax change from 'null' -- '*' was deemed otherwise desirable and the syntax transition made very clearly, I could personally adjust to it without too much difficulty. I agree that the proposed '*' syntax does seem more logical. If a change to that syntax were made such that the old null syntax for the upper bound was retained for backward compatibility, such a transition would be completely painless. Just to clarify, since Terry's response implies this is not understood there is *nothing* special about null currently. It is simply being treated as term text. So adding special * handling would NOT change how null currently works. In June of 2002 (!) null and NULL (and nULL, Null, etc) were removed as being special from what I see in the diff. Furthermore, to achieve the proposed * handling, you can do this yourself now by subclassing QueryParser and overriding getRangeQuery: protected Query getRangeQuery(String field, Analyzer analyzer, String part1, String part2, boolean inclusive) throws ParseException { return new RangeQuery( *.equals(part1) ? null : new Term(field, part1), *.equals(part2) ? null : new Term(field, part2), inclusive); } (a little more is needed if you want to keep the date range handling). Note, you cannot do field:[* TO *] to make it wide-open - RangeQuery does not allow this. My proposal is this (_after_ 1.4 goes final): - Add the above logic to QueryParser. - Modify RangeQuery.toString to output the * when the term is null, and also if the start term is (RangeQuery's constructor modifies the beginning term to if it is null). If there are no objections to this plan, I'll add this as a Bugzilla issue as a reminder. I don't want to touch 1.4's codebase - no point in adding a feature at this stage that can already be achieved with the simple code above. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Open-ended range queries
At one point it definitely supported null for either term. I think that has been removed/forgotten in the later revisions of the QueryParser... Scott On Jun 10, 2004, at 1:24 PM, Erik Hatcher wrote: On Jun 10, 2004, at 2:13 PM, Terry Steichen wrote: Actually, QueryParser does support open-ended ranges like : [term TO null]. Doesn't work for the lower end of the range (though that's usually less of a problem). It supports null? Are you sure? If so, I'm very confused about it because I don't see where in the grammar it has any special handling like that. Could you show an example that demonstrates this? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] smime.p7s Description: S/MIME cryptographic signature
Re: Open-ended range queries
Well, I'm using 1.4 RC3 and the null range upper limit works just fine for searches in two of my fields; one is in the form of a cannonical date (eg, 20040610) and the other is in the form of a padded word count (e.g., 01500 for 1500). The syntax would be pub_date:[20040501 TO null] (dates later than April 30, 2004) and s_words:[01000 TO null] (articles with 1000 or more words). Regards, Terry PS: This use of null has worked this way since at least 1.2. As I recall, way back when, null also worked as the first term limit (but no longer does). - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, June 10, 2004 2:24 PM Subject: Re: Open-ended range queries On Jun 10, 2004, at 2:13 PM, Terry Steichen wrote: Actually, QueryParser does support open-ended ranges like : [term TO null]. Doesn't work for the lower end of the range (though that's usually less of a problem). It supports null? Are you sure? If so, I'm very confused about it because I don't see where in the grammar it has any special handling like that. Could you show an example that demonstrates this? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Open-ended range queries
On Jun 10, 2004, at 4:07 PM, Terry Steichen wrote: Well, I'm using 1.4 RC3 and the null range upper limit works just fine for searches in two of my fields; one is in the form of a cannonical date (eg, 20040610) and the other is in the form of a padded word count (e.g., 01500 for 1500). The syntax would be pub_date:[20040501 TO null] (dates later than April 30, 2004) and s_words:[01000 TO null] (articles with 1000 or more words). Ah It works for you because you have numeric values and lexically null is greater than any of them. It is still using it as a lexical term value, and not truly making the end open-ended. This is why null doesn't work at the beginning for you either. It's just being treated as text, just like your numbers are. PS: This use of null has worked this way since at least 1.2. As I recall, way back when, null also worked as the first term limit (but no longer does). If so, then something serious broke. I've not the time to check the cvs logs on this, but I cannot imagine that we removed something like this. If anyone cares to dig up the diff where we removed/broke this, I'd be gracious. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Open-ended range queries
It looks to me like Revision 1.18 broke it. On Jun 10, 2004, at 3:26 PM, Erik Hatcher wrote: On Jun 10, 2004, at 4:07 PM, Terry Steichen wrote: Well, I'm using 1.4 RC3 and the null range upper limit works just fine for searches in two of my fields; one is in the form of a cannonical date (eg, 20040610) and the other is in the form of a padded word count (e.g., 01500 for 1500). The syntax would be pub_date:[20040501 TO null] (dates later than April 30, 2004) and s_words:[01000 TO null] (articles with 1000 or more words). Ah It works for you because you have numeric values and lexically null is greater than any of them. It is still using it as a lexical term value, and not truly making the end open-ended. This is why null doesn't work at the beginning for you either. It's just being treated as text, just like your numbers are. PS: This use of null has worked this way since at least 1.2. As I recall, way back when, null also worked as the first term limit (but no longer does). If so, then something serious broke. I've not the time to check the cvs logs on this, but I cannot imagine that we removed something like this. If anyone cares to dig up the diff where we removed/broke this, I'd be gracious. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] smime.p7s Description: S/MIME cryptographic signature
Re: Open-ended range queries
Well, I do like the *, but apparently there are some people that are using this with the null... Scott On Jun 10, 2004, at 7:15 PM, Erik Hatcher wrote: On Jun 10, 2004, at 4:54 PM, Scott ganyo wrote: It looks to me like Revision 1.18 broke it. It seems this could be it: revision 1.18 date: 2002/06/25 00:05:31; author: briangoetz; state: Exp; lines: +62 -33 Support for new range query syntax. The delimiter is TO , but is optional for backward compatibility with previous syntax. If the range arguments match the format supported by DateFormat.getDateInstance(DateFormat.SHORT), then they will be converted into the appropriate date strings a la DateField. Added Field.Keyword constructor for Date-valued arguments. Optimized DateField.timeToString function. But geez June 2002 and no one has complained since? Given that this is so outdated, I'm not sure what the right course of action is. There are lots more Lucene users now than there were then. Would adding NULL back be what folks want? What about simply an asterisk to denote open ended-ness? [* TO term] or [term TO *] For completeness, here is the diff: % cvs diff -u -r 1.17 -r 1.18 QueryParser.jj Index: QueryParser.jj === RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/queryParser/ QueryParser.jj,v retrieving revision 1.17 retrieving revision 1.18 diff -u -r1.17 -r1.18 --- QueryParser.jj 20 May 2002 15:45:43 - 1.17 +++ QueryParser.jj 25 Jun 2002 00:05:31 - 1.18 @@ -65,8 +65,11 @@ import java.util.Vector; import java.io.*; +import java.text.*; +import java.util.*; import org.apache.lucene.index.Term; import org.apache.lucene.analysis.*; +import org.apache.lucene.document.*; import org.apache.lucene.search.*; /** @@ -218,35 +221,30 @@ private Query getRangeQuery(String field, Analyzer analyzer, - String queryText, + String part1, + String part2, boolean inclusive) { -// Use the analyzer to get all the tokens. There should be 1 or 2. -TokenStream source = analyzer.tokenStream(field, - new StringReader(queryText)); -Term[] terms = new Term[2]; -org.apache.lucene.analysis.Token t; +boolean isDate = false, isNumber = false; -for (int i = 0; i 2; i++) -{ - try - { -t = source.next(); - } - catch (IOException e) - { -t = null; - } - if (t != null) - { -String text = t.termText(); -if (!text.equalsIgnoreCase(NULL)) -{ - terms[i] = new Term(field, text); -} - } +try { + DateFormat df = DateFormat.getDateInstance(DateFormat.SHORT); + df.setLenient(true); + Date d1 = df.parse(part1); + Date d2 = df.parse(part2); + part1 = DateField.dateToString(d1); + part2 = DateField.dateToString(d2); + isDate = true; } -return new RangeQuery(terms[0], terms[1], inclusive); +catch (Exception e) { } + +if (!isDate) { + // @@@ Add number support +} + +return new RangeQuery(new Term(field, part1), + new Term(field, part2), + inclusive); } public static void main(String[] args) throws Exception { @@ -282,7 +280,7 @@ | #_WHITESPACE: ( | \t ) } -DEFAULT SKIP : { +DEFAULT, RangeIn, RangeEx SKIP : { _WHITESPACE } @@ -303,14 +301,28 @@ | PREFIXTERM: _TERM_START_CHAR (_TERM_CHAR)* * | WILDTERM: _TERM_START_CHAR (_TERM_CHAR | ( [ *, ? ] ))* -| RANGEIN: [ ( ~[ ] ] )+ ] -| RANGEEX: { ( ~[ } ] )+ } +| RANGEIN_START: [ : RangeIn +| RANGEEX_START: { : RangeEx } Boost TOKEN : { NUMBER:(_NUM_CHAR)+ ( . (_NUM_CHAR)+ )? : DEFAULT } +RangeIn TOKEN : { +RANGEIN_TO: TO +| RANGEIN_END: ] : DEFAULT +| RANGEIN_QUOTED: \ (~[\])+ \ +| RANGEIN_GOOP: (~[ , ] ])+ +} + +RangeEx TOKEN : { +RANGEEX_TO: TO +| RANGEEX_END: } : DEFAULT +| RANGEEX_QUOTED: \ (~[\])+ \ +| RANGEEX_GOOP: (~[ , } ])+ +} + // * Query ::= ( Clause )* // * Clause ::= [+, -] [TERM :] ( TERM | ( Query ) ) @@ -387,7 +399,7 @@ Query Term(String field) : { - Token term, boost=null, slop=null; + Token term, boost=null, slop=null, goop1, goop2; boolean prefix = false; boolean wildcard = false; boolean fuzzy = false; @@ -415,12 +427,29 @@ else q = getFieldQuery(field, analyzer, term.image); } - | ( term=RANGEIN { rangein=true; } | term=RANGEEX ) + | ( RANGEIN_START ( goop1=RANGEIN_GOOP|goop1=RANGEIN_QUOTED ) + [ RANGEIN_TO ] ( goop2=RANGEIN_GOOP|goop2=RANGEIN_QUOTED ) + RANGEIN_END ) + [ CARAT boost=NUMBER ] +{ + if (goop1.kind == RANGEIN_QUOTED) +goop1.image = goop1.image.substring(1,
Re: Open-ended range queries
Speaking for myself, only a small number of my code modules currently treat null as the open-ended range query term parameter. If the syntax change from 'null' -- '*' was deemed otherwise desirable and the syntax transition made very clearly, I could personally adjust to it without too much difficulty. I agree that the proposed '*' syntax does seem more logical. If a change to that syntax were made such that the old null syntax for the upper bound was retained for backward compatibility, such a transition would be completely painless. Regards, Terry - Original Message - From: Scott ganyo [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, June 10, 2004 8:57 PM Subject: Re: Open-ended range queries Well, I do like the *, but apparently there are some people that are using this with the null... Scott On Jun 10, 2004, at 7:15 PM, Erik Hatcher wrote: On Jun 10, 2004, at 4:54 PM, Scott ganyo wrote: It looks to me like Revision 1.18 broke it. It seems this could be it: revision 1.18 date: 2002/06/25 00:05:31; author: briangoetz; state: Exp; lines: +62 -33 Support for new range query syntax. The delimiter is TO , but is optional for backward compatibility with previous syntax. If the range arguments match the format supported by DateFormat.getDateInstance(DateFormat.SHORT), then they will be converted into the appropriate date strings a la DateField. Added Field.Keyword constructor for Date-valued arguments. Optimized DateField.timeToString function. But geez June 2002 and no one has complained since? Given that this is so outdated, I'm not sure what the right course of action is. There are lots more Lucene users now than there were then. Would adding NULL back be what folks want? What about simply an asterisk to denote open ended-ness? [* TO term] or [term TO *] For completeness, here is the diff: % cvs diff -u -r 1.17 -r 1.18 QueryParser.jj Index: QueryParser.jj === RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/queryParser/ QueryParser.jj,v retrieving revision 1.17 retrieving revision 1.18 diff -u -r1.17 -r1.18 --- QueryParser.jj 20 May 2002 15:45:43 - 1.17 +++ QueryParser.jj 25 Jun 2002 00:05:31 - 1.18 @@ -65,8 +65,11 @@ import java.util.Vector; import java.io.*; +import java.text.*; +import java.util.*; import org.apache.lucene.index.Term; import org.apache.lucene.analysis.*; +import org.apache.lucene.document.*; import org.apache.lucene.search.*; /** @@ -218,35 +221,30 @@ private Query getRangeQuery(String field, Analyzer analyzer, - String queryText, + String part1, + String part2, boolean inclusive) { -// Use the analyzer to get all the tokens. There should be 1 or 2. -TokenStream source = analyzer.tokenStream(field, - new StringReader(queryText)); -Term[] terms = new Term[2]; -org.apache.lucene.analysis.Token t; +boolean isDate = false, isNumber = false; -for (int i = 0; i 2; i++) -{ - try - { -t = source.next(); - } - catch (IOException e) - { -t = null; - } - if (t != null) - { -String text = t.termText(); -if (!text.equalsIgnoreCase(NULL)) -{ - terms[i] = new Term(field, text); -} - } +try { + DateFormat df = DateFormat.getDateInstance(DateFormat.SHORT); + df.setLenient(true); + Date d1 = df.parse(part1); + Date d2 = df.parse(part2); + part1 = DateField.dateToString(d1); + part2 = DateField.dateToString(d2); + isDate = true; } -return new RangeQuery(terms[0], terms[1], inclusive); +catch (Exception e) { } + +if (!isDate) { + // @@@ Add number support +} + +return new RangeQuery(new Term(field, part1), + new Term(field, part2), + inclusive); } public static void main(String[] args) throws Exception { @@ -282,7 +280,7 @@ | #_WHITESPACE: ( | \t ) } -DEFAULT SKIP : { +DEFAULT, RangeIn, RangeEx SKIP : { _WHITESPACE } @@ -303,14 +301,28 @@ | PREFIXTERM: _TERM_START_CHAR (_TERM_CHAR)* * | WILDTERM: _TERM_START_CHAR (_TERM_CHAR | ( [ *, ? ] ))* -| RANGEIN: [ ( ~[ ] ] )+ ] -| RANGEEX: { ( ~[ } ] )+ } +| RANGEIN_START: [ : RangeIn +| RANGEEX_START: { : RangeEx } Boost TOKEN : { NUMBER:(_NUM_CHAR)+ ( . (_NUM_CHAR)+ )? : DEFAULT } +RangeIn TOKEN : { +RANGEIN_TO