Re: Dash Confusion in QueryParser - Bug? Feature?
Dear Victor: I applied the changed based on the patch. Also, I got t-shirt in the search query. I rebuilt the search index using the modified lucene-1.3-rc2.jar and did the search by the modified jar as well. The search field was specified as indexed, tokenized and stored. When I do the search, I did not get any results. I also tried to use the modified jar to create search query and did the search on the index files which was built by original lucene-1.3-rc2.jar. It did not get search results as well. Could you tell me which part I did wrong? Thanks Jianshuo On Mon, 24 Nov 2003 11:15:38 +1100, Victor Hadianto wrote: Hi, You missed another change in the file, if you follow that thread I later attached a patch that changes another file (standard tokenizer). Hangon let me try to find the patch for you. http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=764036 You also need to change standard tokenizer. Hope this help. /victor - Original Message - From: Jianshuo Niu [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, November 22, 2003 9:34 AM Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Dear Victor: I read your post on lucene bug list. However, I try the change you suggested, but it just changed t-shirts to shirt. I downloaded lucene1.3-rc1 source, changed the above line in QueryParser.jj, and recompiled the source. After the change, the query I got is: +(name:shirt) before the change, the query was: -(name:shirt) I have the following two questions: 1. Did I get the results it supposes to be? 2. in your post, you mentioned only one line change: #_TERM_CHAR: ( _TERM_START_CHAR | _ESCAPED_CHAR | - ) is this only line needs to change? Thank you for time and help Jianshuo On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote: Path: main.gmane.org!not-for-mail From: Victor Hadianto [EMAIL PROTECTED] Newsgroups: gmane.comp.jakarta.lucene.user Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Date: Wed, 15 Oct 2003 10:51:28 +1000 Lines: 14 Approved: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003 00:51:38 GMT) X-Complaints-To: [EMAIL PROTECTED] NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC) Original-X-From: [EMAIL PROTECTED] Wed Oct 15 02:51:36 2003 Return-path: [EMAIL PROTECTED] Original-Received: from daedalus.apache.org ([208.185.179.12] helo=mail.apache.org) by deer.gmane.org with smtp (Exim 3.35 #1 (Debian)) id 1A9Zt1-0004Hs-00 for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200 Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003 00:51:23 - Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm Precedence: bulk List-Unsubscribe: mailto:[EMAIL PROTECTED] List-Subscribe: mailto:[EMAIL PROTECTED] List-Help: mailto:[EMAIL PROTECTED] List-Post: mailto:[EMAIL PROTECTED] List-Id: Lucene Users List lucene-user.jakarta.apache.org Delivered-To: mailing list [EMAIL PROTECTED] Original-Received: (qmail 46822 invoked from network); 15 Oct 2003 00:51:23 - Original-Received: from unknown (HELO avalon.siteprotect.com) (64.26.0.99) by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 - Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au [203.51.7.52]) by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058 for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500 Original-To: Lucene Users List [EMAIL PROTECTED] X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555 X-Report-Spam: http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit On Tuesday, October 14, 2003, at 08:38 PM, Victor Hadianto wrote: I believe this is the same problem that I had the other day. If you search the mailing list for t-shirt you should get some threads discussing this problem. Haha! Better search for shirt, not t-shirt :)) If the QueryParser implemented the solution that I suggested then t-shirt will get you the correct hits :) /vh - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Dash Confusion in QueryParser - Bug? Feature?
Odd ... This is working fine for us. You have to use the patched Lucene to both build and do the search and to make sure to use the same analyser. Failing that, email me the code that you use to build/search the index and I'll have a look. HTH, victor - Original Message - From: Jianshuo Niu [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 7:29 AM Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Dear Victor: I applied the changed based on the patch. Also, I got t-shirt in the search query. I rebuilt the search index using the modified lucene-1.3-rc2.jar and did the search by the modified jar as well. The search field was specified as indexed, tokenized and stored. When I do the search, I did not get any results. I also tried to use the modified jar to create search query and did the search on the index files which was built by original lucene-1.3-rc2.jar. It did not get search results as well. Could you tell me which part I did wrong? Thanks Jianshuo On Mon, 24 Nov 2003 11:15:38 +1100, Victor Hadianto wrote: Hi, You missed another change in the file, if you follow that thread I later attached a patch that changes another file (standard tokenizer). Hangon let me try to find the patch for you. http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=764036 You also need to change standard tokenizer. Hope this help. /victor - Original Message - From: Jianshuo Niu [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, November 22, 2003 9:34 AM Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Dear Victor: I read your post on lucene bug list. However, I try the change you suggested, but it just changed t-shirts to shirt. I downloaded lucene1.3-rc1 source, changed the above line in QueryParser.jj, and recompiled the source. After the change, the query I got is: +(name:shirt) before the change, the query was: -(name:shirt) I have the following two questions: 1. Did I get the results it supposes to be? 2. in your post, you mentioned only one line change: #_TERM_CHAR: ( _TERM_START_CHAR | _ESCAPED_CHAR | - ) is this only line needs to change? Thank you for time and help Jianshuo On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote: Path: main.gmane.org!not-for-mail From: Victor Hadianto [EMAIL PROTECTED] Newsgroups: gmane.comp.jakarta.lucene.user Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Date: Wed, 15 Oct 2003 10:51:28 +1000 Lines: 14 Approved: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003 00:51:38 GMT) X-Complaints-To: [EMAIL PROTECTED] NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC) Original-X-From: [EMAIL PROTECTED] Wed Oct 15 02:51:36 2003 Return-path: [EMAIL PROTECTED] Original-Received: from daedalus.apache.org ([208.185.179.12] helo=mail.apache.org) by deer.gmane.org with smtp (Exim 3.35 #1 (Debian)) id 1A9Zt1-0004Hs-00 for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200 Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003 00:51:23 - Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm Precedence: bulk List-Unsubscribe: mailto:[EMAIL PROTECTED] List-Subscribe: mailto:[EMAIL PROTECTED] List-Help: mailto:[EMAIL PROTECTED] List-Post: mailto:[EMAIL PROTECTED] List-Id: Lucene Users List lucene-user.jakarta.apache.org Delivered-To: mailing list [EMAIL PROTECTED] Original-Received: (qmail 46822 invoked from network); 15 Oct 2003 00:51:23 - Original-Received: from unknown (HELO avalon.siteprotect.com) (64.26.0.99) by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 - Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au [203.51.7.52]) by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058 for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500 Original-To: Lucene Users List [EMAIL PROTECTED] X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555 X-Report-Spam: http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit On Tuesday, October 14, 2003, at 08:38 PM, Victor Hadianto wrote: I believe this is the same problem that I had the other day. If you search the mailing list for t-shirt you should get some threads discussing this problem. Haha! Better search for shirt, not t
Re: Dash Confusion in QueryParser - Bug? Feature?
Dear Victor: Finally, I got search results. I made a mistake to create index files. Thanks a lot Jianshuo On Tue, 25 Nov 2003 09:24:23 +1100, Victor Hadianto wrote: Odd ... This is working fine for us. You have to use the patched Lucene to both build and do the search and to make sure to use the same analyser. Failing that, email me the code that you use to build/search the index and I'll have a look. HTH, victor - Original Message - From: Jianshuo Niu [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 7:29 AM Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Dear Victor: I applied the changed based on the patch. Also, I got t-shirt in the search query. I rebuilt the search index using the modified lucene-1.3-rc2.jar and did the search by the modified jar as well. The search field was specified as indexed, tokenized and stored. When I do the search, I did not get any results. I also tried to use the modified jar to create search query and did the search on the index files which was built by original lucene-1.3-rc2.jar. It did not get search results as well. Could you tell me which part I did wrong? Thanks Jianshuo On Mon, 24 Nov 2003 11:15:38 +1100, Victor Hadianto wrote: Hi, You missed another change in the file, if you follow that thread I later attached a patch that changes another file (standard tokenizer). Hangon let me try to find the patch for you. http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=764036 You also need to change standard tokenizer. Hope this help. /victor - Original Message - From: Jianshuo Niu [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, November 22, 2003 9:34 AM Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Dear Victor: I read your post on lucene bug list. However, I try the change you suggested, but it just changed t-shirts to shirt. I downloaded lucene1.3-rc1 source, changed the above line in QueryParser.jj, and recompiled the source. After the change, the query I got is: +(name:shirt) before the change, the query was: -(name:shirt) I have the following two questions: 1. Did I get the results it supposes to be? 2. in your post, you mentioned only one line change: #_TERM_CHAR: ( _TERM_START_CHAR | _ESCAPED_CHAR | - ) is this only line needs to change? Thank you for time and help Jianshuo On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote: Path: main.gmane.org!not-for-mail From: Victor Hadianto [EMAIL PROTECTED] Newsgroups: gmane.comp.jakarta.lucene.user Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Date: Wed, 15 Oct 2003 10:51:28 +1000 Lines: 14 Approved: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003 00:51:38 GMT) X-Complaints-To: [EMAIL PROTECTED] NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC) Original-X-From: [EMAIL PROTECTED] Wed Oct 15 02:51:36 2003 Return-path: [EMAIL PROTECTED] Original-Received: from daedalus.apache.org ([208.185.179.12] helo=mail.apache.org) by deer.gmane.org with smtp (Exim 3.35 #1 (Debian)) id 1A9Zt1-0004Hs-00 for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200 Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003 00:51:23 - Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm Precedence: bulk List-Unsubscribe: mailto:[EMAIL PROTECTED] List-Subscribe: mailto:[EMAIL PROTECTED] List-Help: mailto:[EMAIL PROTECTED] List-Post: mailto:[EMAIL PROTECTED] List-Id: Lucene Users List lucene-user.jakarta.apache.org Delivered-To: mailing list [EMAIL PROTECTED] Original-Received: (qmail 46822 invoked from network); 15 Oct 2003 00:51:23 - Original-Received: from unknown (HELO avalon.siteprotect.com) (64.26.0.99) by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 - Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au [203.51.7.52]) by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058 for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500 Original-To: Lucene Users List [EMAIL PROTECTED] X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555 X-Report-Spam: http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit On Tuesday, October 14, 2003, at 08:38 PM, Victor Hadianto wrote: I believe
Re: Dash Confusion in QueryParser - Bug? Feature?
Hi, You missed another change in the file, if you follow that thread I later attached a patch that changes another file (standard tokenizer). Hangon let me try to find the patch for you. http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=764036 You also need to change standard tokenizer. Hope this help. /victor - Original Message - From: Jianshuo Niu [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, November 22, 2003 9:34 AM Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Dear Victor: I read your post on lucene bug list. However, I try the change you suggested, but it just changed t-shirts to shirt. I downloaded lucene1.3-rc1 source, changed the above line in QueryParser.jj, and recompiled the source. After the change, the query I got is: +(name:shirt) before the change, the query was: -(name:shirt) I have the following two questions: 1. Did I get the results it supposes to be? 2. in your post, you mentioned only one line change: #_TERM_CHAR: ( _TERM_START_CHAR | _ESCAPED_CHAR | - ) is this only line needs to change? Thank you for time and help Jianshuo On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote: Path: main.gmane.org!not-for-mail From: Victor Hadianto [EMAIL PROTECTED] Newsgroups: gmane.comp.jakarta.lucene.user Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Date: Wed, 15 Oct 2003 10:51:28 +1000 Lines: 14 Approved: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003 00:51:38 GMT) X-Complaints-To: [EMAIL PROTECTED] NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC) Original-X-From: [EMAIL PROTECTED] Wed Oct 15 02:51:36 2003 Return-path: [EMAIL PROTECTED] Original-Received: from daedalus.apache.org ([208.185.179.12] helo=mail.apache.org) by deer.gmane.org with smtp (Exim 3.35 #1 (Debian)) id 1A9Zt1-0004Hs-00 for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200 Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003 00:51:23 - Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm Precedence: bulk List-Unsubscribe: mailto:[EMAIL PROTECTED] List-Subscribe: mailto:[EMAIL PROTECTED] List-Help: mailto:[EMAIL PROTECTED] List-Post: mailto:[EMAIL PROTECTED] List-Id: Lucene Users List lucene-user.jakarta.apache.org Delivered-To: mailing list [EMAIL PROTECTED] Original-Received: (qmail 46822 invoked from network); 15 Oct 2003 00:51:23 - Original-Received: from unknown (HELO avalon.siteprotect.com) (64.26.0.99) by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 - Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au [203.51.7.52]) by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058 for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500 Original-To: Lucene Users List [EMAIL PROTECTED] X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555 X-Report-Spam: http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit On Tuesday, October 14, 2003, at 08:38 PM, Victor Hadianto wrote: I believe this is the same problem that I had the other day. If you search the mailing list for t-shirt you should get some threads discussing this problem. Haha! Better search for shirt, not t-shirt :)) If the QueryParser implemented the solution that I suggested then t-shirt will get you the correct hits :) /vh - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Dash Confusion in QueryParser - Bug? Feature?
Dear Victor: I read your post on lucene bug list. However, I try the change you suggested, but it just changed t-shirts to shirt. I downloaded lucene1.3-rc1 source, changed the above line in QueryParser.jj, and recompiled the source. After the change, the query I got is: +(name:shirt) before the change, the query was: -(name:shirt) I have the following two questions: 1. Did I get the results it supposes to be? 2. in your post, you mentioned only one line change: #_TERM_CHAR: ( _TERM_START_CHAR | _ESCAPED_CHAR | - ) is this only line needs to change? Thank you for time and help Jianshuo On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote: Path: main.gmane.org!not-for-mail From: Victor Hadianto [EMAIL PROTECTED] Newsgroups: gmane.comp.jakarta.lucene.user Subject: Re: Dash Confusion in QueryParser - Bug? Feature? Date: Wed, 15 Oct 2003 10:51:28 +1000 Lines: 14 Approved: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003 00:51:38 GMT) X-Complaints-To: [EMAIL PROTECTED] NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC) Original-X-From: [EMAIL PROTECTED] Wed Oct 15 02:51:36 2003 Return-path: [EMAIL PROTECTED] Original-Received: from daedalus.apache.org ([208.185.179.12] helo=mail.apache.org) by deer.gmane.org with smtp (Exim 3.35 #1 (Debian)) id 1A9Zt1-0004Hs-00 for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200 Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003 00:51:23 - Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm Precedence: bulk List-Unsubscribe: mailto:[EMAIL PROTECTED] List-Subscribe: mailto:[EMAIL PROTECTED] List-Help: mailto:[EMAIL PROTECTED] List-Post: mailto:[EMAIL PROTECTED] List-Id: Lucene Users List lucene-user.jakarta.apache.org Delivered-To: mailing list [EMAIL PROTECTED] Original-Received: (qmail 46822 invoked from network); 15 Oct 2003 00:51:23 - Original-Received: from unknown (HELO avalon.siteprotect.com) (64.26.0.99) by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 - Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au [203.51.7.52]) by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058 for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500 Original-To: Lucene Users List [EMAIL PROTECTED] X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555 X-Report-Spam: http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit On Tuesday, October 14, 2003, at 08:38 PM, Victor Hadianto wrote: I believe this is the same problem that I had the other day. If you search the mailing list for t-shirt you should get some threads discussing this problem. Haha! Better search for shirt, not t-shirt :)) If the QueryParser implemented the solution that I suggested then t-shirt will get you the correct hits :) /vh - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Dash Confusion in QueryParser - Bug? Feature?
On Friday, November 21, 2003, at 02:34 PM, Jianshuo Niu wrote: I read your post on lucene bug list. However, I try the change you suggested, but it just changed t-shirts to shirt. What Analyzer are you using? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Dash Confusion in QueryParser - Bug? Feature?
On Wednesday, October 15, 2003, at 10:24 AM, Michael Giles wrote: So how do we move this issue forward. I can't think of a single case where a - with no whitespace on either side (i.e. t-shirt, Wal-Mart) should be interpreted as a NOT command. Is there a feeling that changing the interpretation of such cases is a break in compatibility? I agree that it will change behavior, but I think that it will change it for the better (i.e. fix it). The current behavior is really broken (and very frustrating for a user trying to search). I looked at the patch here: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23838 I'm not entirely satisfied with it. I'm of the opinion that we should only change QueryParser to fix the behavior of operators nestled within text with no surrounding whitespace. The provided patch only works with the - character, but what about Wal+Mart? Shouldn't we keep that together also and hand it to the analyzer? I'm not convinced at all that we should change the StandardTokenizer to not split on dash. If only QueryParser was fixed and handed Wal-Mart to the StandardAnalyzer, it would be split the same way as during indexing and searches would return the expected hits. Thoughts? I'd like to see this fixed, but in a way that makes the most general sense. Thanks, Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Dash Confusion in QueryParser - Bug? Feature?
Victor Hadianto wrote: If the QueryParser implemented the solution that I suggested then t-shirt will get you the correct hits :) Well, what's the problem? I saw a couple of +1s, so why is your patch not added? Ulrich - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Dash Confusion in QueryParser - Bug? Feature?
I agree that the current behavior is broken and will gladly patch it myself. I'm CC'ing lucene-dev to see if there are any objections. If there are no objections, I'll apply this patch in a couple of days. Erik On Wednesday, October 15, 2003, at 10:24 AM, Michael Giles wrote: So how do we move this issue forward. I can't think of a single case where a - with no whitespace on either side (i.e. t-shirt, Wal-Mart) should be interpreted as a NOT command. Is there a feeling that changing the interpretation of such cases is a break in compatibility? I agree that it will change behavior, but I think that it will change it for the better (i.e. fix it). The current behavior is really broken (and very frustrating for a user trying to search). -Mike At 10:08 AM 10/15/2003, you wrote: --- Ulrich Mayring [EMAIL PROTECTED] wrote: Victor Hadianto wrote: If the QueryParser implemented the solution that I suggested then t-shirt will get you the correct hits :) Well, what's the problem? I saw a couple of +1s, so why is your patch not added? 1. +1s were from non-developers 2. The change looked like it would not be backwards compatible. (see the original email from Victor) It is also better if patches are added to Bugzilla. Otis __ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Dash Confusion in QueryParser - Bug? Feature?
So what do we need to do to resolve this? Has the discussion stopped because this is the user list and not dev or did it move over to the dev list? -Mike At 03:49 AM 10/13/2003, you wrote: Michael Giles wrote: He is probably using the StandardAnalyzer. I was about to write the exact same email (but using Wal-Mart as an example on this page - http://www.benchmark.com/cgi-bin/suid/~bcmlp/newsletter.cgi?mode=showyear=2003date=2003-10-07). I index and search with the same analyzer (Standard), but when I search for Wal-Mart, I don't find a match. I DO find a match if I search for Wal-Mart or Wal Mart (no hyphen). This seems like a bug. I'm not sure whether it has to do with the Analyzer, the thing happens with the Snowball Analyzers as well. Ulrich - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Dash Confusion in QueryParser - Bug? Feature?
- Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, October 15, 2003 10:01 AM Subject: Re: Dash Confusion in QueryParser - Bug? Feature? On Saturday, October 11, 2003, at 09:44 AM, Michael Giles wrote: He is probably using the StandardAnalyzer. I was about to write the exact same email (but using Wal-Mart as an example on this page - http://www.benchmark.com/cgi-bin/suid/~bcmlp/ newsletter.cgi?mode=showyear=2003date=2003-10-07). I index and search with the same analyzer (Standard), but when I search for Wal-Mart, I don't find a match. I DO find a match if I search for Wal-Mart or Wal Mart (no hyphen). This seems like a bug. Sorry for the delay. I've been meaning to reply to this. When you index using StandardAnalyzer, you are indexing it to two terms wal and mart (without the quotes). QueryParser does its own (weird?) stuff to strings passed to it. Here's how it breaks down: String[] queries = {Wal-Mart, \Wal-Mart\, Wal Mart}; for (int i = 0; i queries.length; i++) { String query = queries[i]; Query q = QueryParser.parse(query, contents, new StandardAnalyzer()); System.out.println(query + = + q); } Wal-Mart = contents:wal -contents:mart Wal-Mart = contents:wal mart Wal Mart = contents:wal contents:mart Notice all three are completely different queries. The Wal-Mart one is excluding mart making it miss documents you expect. The second one is a phrase query, which is basically what you're after. The third one is matching any documents with wal or mart in them regardless of whether they are side-by-side. Is this a bug? Nah... just the nature of the QueryParser beast. It would be a non-backwards-compatible change to change how QueryParser deals with a dash. That is the main issue here with it interpreting it as a NOT operator. But it seems logical to me that it shouldn't do so when its mashed against a word like this and leave it to the analyzer to deal with. I believe this is the same problem that I had the other day. If you search the mailing list for t-shirt you should get some threads discussing this problem. In fact why don't give it here: http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]by=threadfrom=317960 Cheers, victor - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Dash Confusion in QueryParser - Bug? Feature?
Hello, when I search for MS-Word I get all the documents that contain exactly that word, which is good. If, however, I search for MS-Word (without the quotes), then the MultiFieldQueryParser restructures the query to MS -Word and I consequently get all documents that contain MS and not Word. Why does the MultiFieldQueryParser insert the extra blank here? Are there use cases, where this would make sense? Or is it a bug? Ulrich - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Dash Confusion in QueryParser - Bug? Feature?
On Friday, October 10, 2003, at 04:30 AM, Ulrich Mayring wrote: when I search for MS-Word I get all the documents that contain exactly that word, which is good. If, however, I search for MS-Word (without the quotes), then the MultiFieldQueryParser restructures the query to MS -Word and I consequently get all documents that contain MS and not Word. What Analyzer are you using? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]