Re: Dash Confusion in QueryParser - Bug? Feature?

2003-11-24 Thread Jianshuo Niu
Dear Victor:

I applied the changed based on the patch. Also, I got t-shirt in the search query. 
I rebuilt the search index using the modified lucene-1.3-rc2.jar and did the search by 
the modified jar as well. 
The search field was specified as indexed, tokenized and stored.
When I do the search, I did not get any results. I also tried to use the modified jar 
to create search query and did the search on the index files which was built by 
original lucene-1.3-rc2.jar. It did not get search results as well. Could you tell me 
which part I did wrong?

Thanks

Jianshuo

On Mon, 24 Nov 2003 11:15:38 +1100, Victor Hadianto wrote:

 Hi,
 
 You missed another change in the file, if you follow that thread I later
 attached a patch that changes another file (standard tokenizer). Hangon let
 me try to find the patch for you.
 
 http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=764036
 
 You also need to change standard tokenizer.
 
 Hope this help.
 
 /victor
 
 - Original Message - 
 From: Jianshuo Niu [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Saturday, November 22, 2003 9:34 AM
 Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
 
 
 Dear Victor:

 I read your post on lucene bug list. However, I try the change you
 suggested, but it just changed t-shirts to shirt.

 I downloaded lucene1.3-rc1 source, changed the above line in
 QueryParser.jj, and recompiled the source. After the change, the query I
 got is:

 +(name:shirt)

 before the change, the query was:

 -(name:shirt)

 I have the following two questions:

 1. Did I get the results it supposes to be?
 2. in your post, you mentioned only one line change: #_TERM_CHAR: (
 _TERM_START_CHAR | _ESCAPED_CHAR | - ) 
 is this only line needs to change?




 Thank you for time and help


 Jianshuo


 On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote:

  Path: main.gmane.org!not-for-mail
  From: Victor Hadianto [EMAIL PROTECTED]
  Newsgroups: gmane.comp.jakarta.lucene.user
  Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
  Date: Wed, 15 Oct 2003 10:51:28 +1000
  Lines: 14
  Approved: [EMAIL PROTECTED]
  Message-ID: [EMAIL PROTECTED]
  References: [EMAIL PROTECTED]
  Reply-To: Lucene Users List [EMAIL PROTECTED]
  NNTP-Posting-Host: deer.gmane.org
  X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003
 00:51:38
  GMT)
  X-Complaints-To: [EMAIL PROTECTED]
  NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC)
  Original-X-From:
  [EMAIL PROTECTED]
 Wed
  Oct 15 02:51:36 2003
  Return-path:
  [EMAIL PROTECTED]
  Original-Received: from daedalus.apache.org ([208.185.179.12]
  helo=mail.apache.org)
  by deer.gmane.org with smtp (Exim 3.35 #1 (Debian))
  id 1A9Zt1-0004Hs-00
  for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200
  Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003
 00:51:23
  -
  Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
  Precedence: bulk
  List-Unsubscribe: mailto:[EMAIL PROTECTED]
  List-Subscribe: mailto:[EMAIL PROTECTED]
  List-Help: mailto:[EMAIL PROTECTED]
  List-Post: mailto:[EMAIL PROTECTED]
  List-Id: Lucene Users List lucene-user.jakarta.apache.org
  Delivered-To: mailing list [EMAIL PROTECTED]
  Original-Received: (qmail 46822 invoked from network); 15 Oct 2003
 00:51:23
  -
  Original-Received: from unknown (HELO avalon.siteprotect.com)
 (64.26.0.99)
  by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 -
  Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au
  [203.51.7.52])
  by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058
  for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500
  Original-To: Lucene Users List [EMAIL PROTECTED]
  X-Priority: 3
  X-MSMail-Priority: Normal
  X-Mailer: Microsoft Outlook Express 6.00.2800.1158
  X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
  X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
  Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555
  X-Report-Spam: http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555
  MIME-Version: 1.0
  Content-Type: text/plain; charset=iso-8859-1
  Content-Transfer-Encoding: 7bit
 
 
  On Tuesday, October 14, 2003, at 08:38  PM, Victor Hadianto wrote:
   I believe this is the same problem that I had the other day. If you
   search
   the mailing list for t-shirt you should get some threads discussing
   this
   problem.
 
  Haha!  Better search for shirt, not t-shirt :))
 
  If the QueryParser implemented the solution that I suggested then
 t-shirt
  will get you the correct hits :)
 
 
  /vh



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Dash Confusion in QueryParser - Bug? Feature?

2003-11-24 Thread Victor Hadianto
Odd ...

This is working fine for us. You have to use the patched Lucene to both
build and do the search and to make sure to use the same analyser. Failing
that, email me the code that you use to build/search the index and I'll have
a look.

HTH,

victor

- Original Message - 
From: Jianshuo Niu [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 7:29 AM
Subject: Re: Dash Confusion in QueryParser - Bug? Feature?


 Dear Victor:

 I applied the changed based on the patch. Also, I got t-shirt in the
search query.
 I rebuilt the search index using the modified lucene-1.3-rc2.jar and did
the search by the modified jar as well.
 The search field was specified as indexed, tokenized and stored.
 When I do the search, I did not get any results. I also tried to use the
modified jar to create search query and did the search on the index files
which was built by original lucene-1.3-rc2.jar. It did not get search
results as well. Could you tell me which part I did wrong?

 Thanks

 Jianshuo

 On Mon, 24 Nov 2003 11:15:38 +1100, Victor Hadianto wrote:

  Hi,
 
  You missed another change in the file, if you follow that thread I later
  attached a patch that changes another file (standard tokenizer). Hangon
let
  me try to find the patch for you.
 
 
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=764036
 
  You also need to change standard tokenizer.
 
  Hope this help.
 
  /victor
 
  - Original Message - 
  From: Jianshuo Niu [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Saturday, November 22, 2003 9:34 AM
  Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
 
 
  Dear Victor:
 
  I read your post on lucene bug list. However, I try the change you
  suggested, but it just changed t-shirts to shirt.
 
  I downloaded lucene1.3-rc1 source, changed the above line in
  QueryParser.jj, and recompiled the source. After the change, the query
I
  got is:
 
  +(name:shirt)
 
  before the change, the query was:
 
  -(name:shirt)
 
  I have the following two questions:
 
  1. Did I get the results it supposes to be?
  2. in your post, you mentioned only one line change: #_TERM_CHAR: (
  _TERM_START_CHAR | _ESCAPED_CHAR | - ) 
  is this only line needs to change?
 
 
 
 
  Thank you for time and help
 
 
  Jianshuo
 
 
  On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote:
 
   Path: main.gmane.org!not-for-mail
   From: Victor Hadianto [EMAIL PROTECTED]
   Newsgroups: gmane.comp.jakarta.lucene.user
   Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
   Date: Wed, 15 Oct 2003 10:51:28 +1000
   Lines: 14
   Approved: [EMAIL PROTECTED]
   Message-ID: [EMAIL PROTECTED]
   References:
[EMAIL PROTECTED]
   Reply-To: Lucene Users List [EMAIL PROTECTED]
   NNTP-Posting-Host: deer.gmane.org
   X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003
  00:51:38
   GMT)
   X-Complaints-To: [EMAIL PROTECTED]
   NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC)
   Original-X-From:
  
[EMAIL PROTECTED]
  Wed
   Oct 15 02:51:36 2003
   Return-path:
  
[EMAIL PROTECTED]
   Original-Received: from daedalus.apache.org ([208.185.179.12]
   helo=mail.apache.org)
   by deer.gmane.org with smtp (Exim 3.35 #1 (Debian))
   id 1A9Zt1-0004Hs-00
   for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200
   Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003
  00:51:23
   -
   Mailing-List: contact [EMAIL PROTECTED]; run by
ezmlm
   Precedence: bulk
   List-Unsubscribe: mailto:[EMAIL PROTECTED]
   List-Subscribe: mailto:[EMAIL PROTECTED]
   List-Help: mailto:[EMAIL PROTECTED]
   List-Post: mailto:[EMAIL PROTECTED]
   List-Id: Lucene Users List lucene-user.jakarta.apache.org
   Delivered-To: mailing list [EMAIL PROTECTED]
   Original-Received: (qmail 46822 invoked from network); 15 Oct 2003
  00:51:23
   -
   Original-Received: from unknown (HELO avalon.siteprotect.com)
  (64.26.0.99)
   by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 -
   Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au
   [203.51.7.52])
   by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058
   for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500
   Original-To: Lucene Users List [EMAIL PROTECTED]
   X-Priority: 3
   X-MSMail-Priority: Normal
   X-Mailer: Microsoft Outlook Express 6.00.2800.1158
   X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
   X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
   Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555
   X-Report-Spam:
http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555
   MIME-Version: 1.0
   Content-Type: text/plain; charset=iso-8859-1
   Content-Transfer-Encoding: 7bit
  
  
   On Tuesday, October 14, 2003, at 08:38  PM, Victor Hadianto wrote:
I believe this is the same problem that I had the other day. If
you
search
the mailing list for t-shirt you should get some threads
discussing
this
problem.
  
   Haha!  Better search for shirt, not t

Re: Dash Confusion in QueryParser - Bug? Feature?

2003-11-24 Thread Jianshuo Niu
Dear Victor:

Finally, I got search results. I made a mistake to create index files.

Thanks  a lot

Jianshuo

On Tue, 25 Nov 2003 09:24:23 +1100, Victor Hadianto wrote:

 Odd ...
 
 This is working fine for us. You have to use the patched Lucene to both
 build and do the search and to make sure to use the same analyser. Failing
 that, email me the code that you use to build/search the index and I'll have
 a look.
 
 HTH,
 
 victor
 
 - Original Message - 
 From: Jianshuo Niu [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Tuesday, November 25, 2003 7:29 AM
 Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
 
 
 Dear Victor:

 I applied the changed based on the patch. Also, I got t-shirt in the
 search query.
 I rebuilt the search index using the modified lucene-1.3-rc2.jar and did
 the search by the modified jar as well.
 The search field was specified as indexed, tokenized and stored.
 When I do the search, I did not get any results. I also tried to use the
 modified jar to create search query and did the search on the index files
 which was built by original lucene-1.3-rc2.jar. It did not get search
 results as well. Could you tell me which part I did wrong?

 Thanks

 Jianshuo

 On Mon, 24 Nov 2003 11:15:38 +1100, Victor Hadianto wrote:

  Hi,
 
  You missed another change in the file, if you follow that thread I later
  attached a patch that changes another file (standard tokenizer). Hangon
 let
  me try to find the patch for you.
 
 
 http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=764036
 
  You also need to change standard tokenizer.
 
  Hope this help.
 
  /victor
 
  - Original Message - 
  From: Jianshuo Niu [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Saturday, November 22, 2003 9:34 AM
  Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
 
 
  Dear Victor:
 
  I read your post on lucene bug list. However, I try the change you
  suggested, but it just changed t-shirts to shirt.
 
  I downloaded lucene1.3-rc1 source, changed the above line in
  QueryParser.jj, and recompiled the source. After the change, the query
 I
  got is:
 
  +(name:shirt)
 
  before the change, the query was:
 
  -(name:shirt)
 
  I have the following two questions:
 
  1. Did I get the results it supposes to be?
  2. in your post, you mentioned only one line change: #_TERM_CHAR: (
  _TERM_START_CHAR | _ESCAPED_CHAR | - ) 
  is this only line needs to change?
 
 
 
 
  Thank you for time and help
 
 
  Jianshuo
 
 
  On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote:
 
   Path: main.gmane.org!not-for-mail
   From: Victor Hadianto [EMAIL PROTECTED]
   Newsgroups: gmane.comp.jakarta.lucene.user
   Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
   Date: Wed, 15 Oct 2003 10:51:28 +1000
   Lines: 14
   Approved: [EMAIL PROTECTED]
   Message-ID: [EMAIL PROTECTED]
   References:
 [EMAIL PROTECTED]
   Reply-To: Lucene Users List [EMAIL PROTECTED]
   NNTP-Posting-Host: deer.gmane.org
   X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003
  00:51:38
   GMT)
   X-Complaints-To: [EMAIL PROTECTED]
   NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC)
   Original-X-From:
  
 [EMAIL PROTECTED]
  Wed
   Oct 15 02:51:36 2003
   Return-path:
  
 [EMAIL PROTECTED]
   Original-Received: from daedalus.apache.org ([208.185.179.12]
   helo=mail.apache.org)
   by deer.gmane.org with smtp (Exim 3.35 #1 (Debian))
   id 1A9Zt1-0004Hs-00
   for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200
   Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003
  00:51:23
   -
   Mailing-List: contact [EMAIL PROTECTED]; run by
 ezmlm
   Precedence: bulk
   List-Unsubscribe: mailto:[EMAIL PROTECTED]
   List-Subscribe: mailto:[EMAIL PROTECTED]
   List-Help: mailto:[EMAIL PROTECTED]
   List-Post: mailto:[EMAIL PROTECTED]
   List-Id: Lucene Users List lucene-user.jakarta.apache.org
   Delivered-To: mailing list [EMAIL PROTECTED]
   Original-Received: (qmail 46822 invoked from network); 15 Oct 2003
  00:51:23
   -
   Original-Received: from unknown (HELO avalon.siteprotect.com)
  (64.26.0.99)
   by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 -
   Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au
   [203.51.7.52])
   by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058
   for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500
   Original-To: Lucene Users List [EMAIL PROTECTED]
   X-Priority: 3
   X-MSMail-Priority: Normal
   X-Mailer: Microsoft Outlook Express 6.00.2800.1158
   X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
   X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
   Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555
   X-Report-Spam:
 http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555
   MIME-Version: 1.0
   Content-Type: text/plain; charset=iso-8859-1
   Content-Transfer-Encoding: 7bit
  
  
   On Tuesday, October 14, 2003, at 08:38  PM, Victor Hadianto wrote:
I believe

Re: Dash Confusion in QueryParser - Bug? Feature?

2003-11-23 Thread Victor Hadianto
Hi,

You missed another change in the file, if you follow that thread I later
attached a patch that changes another file (standard tokenizer). Hangon let
me try to find the patch for you.

http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=764036

You also need to change standard tokenizer.

Hope this help.

/victor

- Original Message - 
From: Jianshuo Niu [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, November 22, 2003 9:34 AM
Subject: Re: Dash Confusion in QueryParser - Bug? Feature?


 Dear Victor:

 I read your post on lucene bug list. However, I try the change you
 suggested, but it just changed t-shirts to shirt.

 I downloaded lucene1.3-rc1 source, changed the above line in
 QueryParser.jj, and recompiled the source. After the change, the query I
 got is:

 +(name:shirt)

 before the change, the query was:

 -(name:shirt)

 I have the following two questions:

 1. Did I get the results it supposes to be?
 2. in your post, you mentioned only one line change: #_TERM_CHAR: (
 _TERM_START_CHAR | _ESCAPED_CHAR | - ) 
 is this only line needs to change?




 Thank you for time and help


 Jianshuo


 On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote:

  Path: main.gmane.org!not-for-mail
  From: Victor Hadianto [EMAIL PROTECTED]
  Newsgroups: gmane.comp.jakarta.lucene.user
  Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
  Date: Wed, 15 Oct 2003 10:51:28 +1000
  Lines: 14
  Approved: [EMAIL PROTECTED]
  Message-ID: [EMAIL PROTECTED]
  References: [EMAIL PROTECTED]
  Reply-To: Lucene Users List [EMAIL PROTECTED]
  NNTP-Posting-Host: deer.gmane.org
  X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003
00:51:38
  GMT)
  X-Complaints-To: [EMAIL PROTECTED]
  NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC)
  Original-X-From:
  [EMAIL PROTECTED]
Wed
  Oct 15 02:51:36 2003
  Return-path:
  [EMAIL PROTECTED]
  Original-Received: from daedalus.apache.org ([208.185.179.12]
  helo=mail.apache.org)
  by deer.gmane.org with smtp (Exim 3.35 #1 (Debian))
  id 1A9Zt1-0004Hs-00
  for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200
  Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003
00:51:23
  -
  Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
  Precedence: bulk
  List-Unsubscribe: mailto:[EMAIL PROTECTED]
  List-Subscribe: mailto:[EMAIL PROTECTED]
  List-Help: mailto:[EMAIL PROTECTED]
  List-Post: mailto:[EMAIL PROTECTED]
  List-Id: Lucene Users List lucene-user.jakarta.apache.org
  Delivered-To: mailing list [EMAIL PROTECTED]
  Original-Received: (qmail 46822 invoked from network); 15 Oct 2003
00:51:23
  -
  Original-Received: from unknown (HELO avalon.siteprotect.com)
(64.26.0.99)
  by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 -
  Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au
  [203.51.7.52])
  by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058
  for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500
  Original-To: Lucene Users List [EMAIL PROTECTED]
  X-Priority: 3
  X-MSMail-Priority: Normal
  X-Mailer: Microsoft Outlook Express 6.00.2800.1158
  X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
  X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
  Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555
  X-Report-Spam: http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555
  MIME-Version: 1.0
  Content-Type: text/plain; charset=iso-8859-1
  Content-Transfer-Encoding: 7bit
 
 
  On Tuesday, October 14, 2003, at 08:38  PM, Victor Hadianto wrote:
   I believe this is the same problem that I had the other day. If you
   search
   the mailing list for t-shirt you should get some threads discussing
   this
   problem.
 
  Haha!  Better search for shirt, not t-shirt :))
 
  If the QueryParser implemented the solution that I suggested then
t-shirt
  will get you the correct hits :)
 
 
  /vh



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Dash Confusion in QueryParser - Bug? Feature?

2003-11-21 Thread Jianshuo Niu
Dear Victor:
 
I read  your post on lucene bug list. However, I try the change you
suggested, but it just changed t-shirts to shirt.

I downloaded lucene1.3-rc1 source, changed the above line in
QueryParser.jj, and recompiled the source. After the change, the query I
got is: 

+(name:shirt)

before the change, the query was:

-(name:shirt)

I have the following two questions:

1. Did I get the results it supposes to be?
2. in your post, you mentioned only one line change: #_TERM_CHAR: (
_TERM_START_CHAR | _ESCAPED_CHAR | - ) 
is this only line needs to change?
 

 

Thank you for time and help


Jianshuo


On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote:

 Path: main.gmane.org!not-for-mail
 From: Victor Hadianto [EMAIL PROTECTED]
 Newsgroups: gmane.comp.jakarta.lucene.user
 Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
 Date: Wed, 15 Oct 2003 10:51:28 +1000
 Lines: 14
 Approved: [EMAIL PROTECTED]
 Message-ID: [EMAIL PROTECTED]
 References: [EMAIL PROTECTED]
 Reply-To: Lucene Users List [EMAIL PROTECTED]
 NNTP-Posting-Host: deer.gmane.org
 X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003 00:51:38
   GMT)
 X-Complaints-To: [EMAIL PROTECTED]
 NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC)
 Original-X-From:
   [EMAIL PROTECTED] Wed
   Oct 15 02:51:36 2003
 Return-path:
   [EMAIL PROTECTED]
 Original-Received: from daedalus.apache.org ([208.185.179.12]
   helo=mail.apache.org)
   by deer.gmane.org with smtp (Exim 3.35 #1 (Debian))
   id 1A9Zt1-0004Hs-00
   for [EMAIL PROTECTED]; Wed, 15 Oct 2003 02:51:36 +0200
 Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003 00:51:23
   -
 Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
 Precedence: bulk
 List-Unsubscribe: mailto:[EMAIL PROTECTED]
 List-Subscribe: mailto:[EMAIL PROTECTED]
 List-Help: mailto:[EMAIL PROTECTED]
 List-Post: mailto:[EMAIL PROTECTED]
 List-Id: Lucene Users List lucene-user.jakarta.apache.org
 Delivered-To: mailing list [EMAIL PROTECTED]
 Original-Received: (qmail 46822 invoked from network); 15 Oct 2003 00:51:23
   -
 Original-Received: from unknown (HELO avalon.siteprotect.com) (64.26.0.99) 
   by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 -
 Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au
   [203.51.7.52])
   by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058
   for [EMAIL PROTECTED]; Tue, 14 Oct 2003 19:51:30 -0500
 Original-To: Lucene Users List [EMAIL PROTECTED]
 X-Priority: 3
 X-MSMail-Priority: Normal
 X-Mailer: Microsoft Outlook Express 6.00.2800.1158
 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
 Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555
 X-Report-Spam: http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555
 MIME-Version: 1.0
 Content-Type: text/plain; charset=iso-8859-1
 Content-Transfer-Encoding: 7bit
 
 
 On Tuesday, October 14, 2003, at 08:38  PM, Victor Hadianto wrote:
  I believe this is the same problem that I had the other day. If you
  search
  the mailing list for t-shirt you should get some threads discussing
  this
  problem.

 Haha!  Better search for shirt, not t-shirt :))
 
 If the QueryParser implemented the solution that I suggested then t-shirt
 will get you the correct hits :)
 
 
 /vh



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Dash Confusion in QueryParser - Bug? Feature?

2003-11-21 Thread Erik Hatcher
On Friday, November 21, 2003, at 02:34  PM, Jianshuo Niu wrote:
I read  your post on lucene bug list. However, I try the change you
suggested, but it just changed t-shirts to shirt.
What Analyzer are you using?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-20 Thread Erik Hatcher
On Wednesday, October 15, 2003, at 10:24  AM, Michael Giles wrote:
So how do we move this issue forward.  I can't think of a single case 
where a - with no whitespace on either side (i.e. t-shirt, Wal-Mart) 
should be interpreted as a NOT command.  Is there a feeling that 
changing the interpretation of such cases is a break in compatibility? 
 I agree that it will change behavior, but I think that it will change 
it for the better (i.e. fix it).  The current behavior is really 
broken (and very frustrating for a user trying to search).
I looked at the patch here:

	http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23838

I'm not entirely satisfied with it.  I'm of the opinion that we should 
only change QueryParser to fix the behavior of operators nestled within 
text with no surrounding whitespace.  The provided patch only works 
with the - character, but what about Wal+Mart?  Shouldn't we keep 
that together also and hand it to the analyzer?

I'm not convinced at all that we should change the StandardTokenizer to 
not split on dash.  If only QueryParser was fixed and handed Wal-Mart 
to the StandardAnalyzer, it would be split the same way as during 
indexing and searches would return the expected hits.

Thoughts?  I'd like to see this fixed, but in a way that makes the most 
general sense.

Thanks,
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-15 Thread Ulrich Mayring
Victor Hadianto wrote:
If the QueryParser implemented the solution that I suggested then t-shirt
will get you the correct hits :)
Well, what's the problem? I saw a couple of +1s, so why is your patch 
not added?

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-15 Thread Erik Hatcher
I agree that the current behavior is broken and will gladly patch it 
myself.  I'm CC'ing lucene-dev to see if there are any objections.  If 
there are no objections, I'll apply this patch in a couple of days.

	Erik

On Wednesday, October 15, 2003, at 10:24  AM, Michael Giles wrote:

So how do we move this issue forward.  I can't think of a single case 
where a - with no whitespace on either side (i.e. t-shirt, Wal-Mart) 
should be interpreted as a NOT command.  Is there a feeling that 
changing the interpretation of such cases is a break in compatibility? 
 I agree that it will change behavior, but I think that it will change 
it for the better (i.e. fix it).  The current behavior is really 
broken (and very frustrating for a user trying to search).

-Mike

At 10:08 AM 10/15/2003, you wrote:

--- Ulrich Mayring [EMAIL PROTECTED] wrote:
 Victor Hadianto wrote:
 
  If the QueryParser implemented the solution that I suggested then
 t-shirt
  will get you the correct hits :)

 Well, what's the problem? I saw a couple of +1s, so why is your 
patch

 not added?

1. +1s were from non-developers
2. The change looked like it would not be backwards compatible. (see
the original email from Victor)
It is also better if patches are added to Bugzilla.

Otis

__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-14 Thread Michael Giles
So what do we need to do to resolve this?  Has the discussion stopped 
because this is the user list and not dev or did it move over to the 
dev list?

-Mike

At 03:49 AM 10/13/2003, you wrote:
Michael Giles wrote:
He is probably using the StandardAnalyzer.  I was about to write the 
exact same email (but using Wal-Mart as an example on this page - 
http://www.benchmark.com/cgi-bin/suid/~bcmlp/newsletter.cgi?mode=showyear=2003date=2003-10-07). 
I index and search with the same analyzer (Standard), but when I search 
for Wal-Mart, I don't find a match.  I DO find a match if I search for 
Wal-Mart or Wal Mart (no hyphen).  This seems like a bug.
I'm not sure whether it has to do with the Analyzer, the thing happens 
with the Snowball Analyzers as well.

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-14 Thread Victor Hadianto
- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, October 15, 2003 10:01 AM
Subject: Re: Dash Confusion in QueryParser - Bug? Feature?


 On Saturday, October 11, 2003, at 09:44  AM, Michael Giles wrote:
  He is probably using the StandardAnalyzer.  I was about to write the
  exact same email (but using Wal-Mart as an example on this page -
  http://www.benchmark.com/cgi-bin/suid/~bcmlp/
  newsletter.cgi?mode=showyear=2003date=2003-10-07). I index and
  search with the same analyzer (Standard), but when I search for
  Wal-Mart, I don't find a match.  I DO find a match if I search for
  Wal-Mart or Wal Mart (no hyphen).  This seems like a bug.

 Sorry for the delay.  I've been meaning to reply to this.

 When you index using StandardAnalyzer, you are indexing it to two terms
 wal and mart (without the quotes).  QueryParser does its own
 (weird?) stuff to
 strings passed to it.  Here's how it breaks down:

  String[] queries = {Wal-Mart, \Wal-Mart\, Wal Mart};
  for (int i = 0; i  queries.length; i++) {
String query = queries[i];
Query q = QueryParser.parse(query, contents, new
 StandardAnalyzer());
System.out.println(query +  =  + q);
  }

 Wal-Mart = contents:wal -contents:mart
 Wal-Mart = contents:wal mart
 Wal Mart = contents:wal contents:mart

 Notice all three are completely different queries.  The Wal-Mart one is
 excluding mart making it miss documents you expect.  The second one
 is a phrase query, which is basically what you're after.  The third one
 is matching any documents with wal or mart in them regardless of
 whether they are side-by-side.

 Is this a bug?  Nah... just the nature of the QueryParser beast.  It
 would be a non-backwards-compatible change to change how QueryParser
 deals with a dash. That is the main issue here with it interpreting it
 as a NOT operator.  But it seems logical to me that it shouldn't do so
 when its mashed against a word like this and leave it to the analyzer
 to deal with.

I believe this is the same problem that I had the other day. If you search
the mailing list for t-shirt you should get some threads discussing this
problem.

In fact why don't give it here:

http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]by=threadfrom=317960


Cheers,

victor


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Dash Confusion in QueryParser - Bug? Feature?

2003-10-10 Thread Ulrich Mayring
Hello,

when I search for MS-Word I get all the documents that contain exactly 
that word, which is good. If, however, I search for MS-Word (without the 
quotes), then the MultiFieldQueryParser restructures the query to MS 
-Word and I consequently get all documents that contain MS and not 
Word.

Why does the MultiFieldQueryParser insert the extra blank here? Are 
there use cases, where this would make sense? Or is it a bug?

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-10 Thread Erik Hatcher
On Friday, October 10, 2003, at 04:30  AM, Ulrich Mayring wrote:
when I search for MS-Word I get all the documents that contain 
exactly that word, which is good. If, however, I search for MS-Word 
(without the quotes), then the MultiFieldQueryParser restructures the 
query to MS -Word and I consequently get all documents that contain 
MS and not Word.
What Analyzer are you using?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]