Re: search question

2004-12-23 Thread roy-lucene-user
Erik,

They both use the StandardAnalyzer... however looking at the toString() makes
everything clearer.  In the case a string has the following email address:
[EMAIL PROTECTED], it gets split like so: first.last domain.com

However in 1.4 it does not get split.

So now we just check to see if an index was built using 1.2 or 1.4 and have
some checks thrown in.

Thanks for the guidance.

Roy.

On Wed, 22 Dec 2004 18:41:44 -0500, Erik Hatcher wrote
 What does toString() return for each of those queries?  Are you 
 using the same analyzer in both cases?
 
   Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



search question

2004-12-22 Thread roy-lucene-user
Hi guys,

We have an index with some fields containing email addresses.  Doing a search 
for an email address with this format: [EMAIL PROTECTED], does not bring up any 
results with lucene 1.4.

The query: Field1:[EMAIL PROTECTED]

However it returns results with 1.2.  Any ideas?

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: search question

2004-12-22 Thread Erik Hatcher
What does toString() return for each of those queries?  Are you using 
the same analyzer in both cases?

Erik
On Dec 22, 2004, at 5:44 PM, [EMAIL PROTECTED] wrote:
Hi guys,
We have an index with some fields containing email addresses.  Doing a 
search for an email address with this format: [EMAIL PROTECTED], 
does not bring up any results with lucene 1.4.

The query: Field1:[EMAIL PROTECTED]
However it returns results with 1.2.  Any ideas?
Roy.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Index and Search question in Lucene.

2004-08-21 Thread Ernesto De Santis
Hi Dimitri

What analyzer you use?

You need take carefully with Keyword fields and analyzers. When you
index a Document, the fields that have set tokenized = false, like
Keyword, are not analyzed. 
In search time you need parse the query with your analyzer but not
analyze the untokenized fields, like your filename.

 I can do a search as this
 +contents:SomeWord  +filename:SomePath
 

The sintaxis is rigth, but if you search +filename:somepath, find only
this file.

For example, 
+content:version +filename:/my/path/myfile.ext

Only can found myfile.ext, and if this file don't content version, not
going to find nothing. This is because you use +. + set the term
required.

You can see the queries sintaxis in lucene site.

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.searchtoc=faq#q5

good luck.

Bye
Ernesto.


El dom, 15 de 08 de 2004 a las 17:13, Dmitrii PapaGeorgio escribi:
 Ok so when I index a file such as below
 
 Document doc = new Document();
 doc.Add(Field.Text(contents, new StreamReader(dataDir)));
 doc.Add(Field.Keyword(filename, dataDir));
 
 I can do a search as this
 +contents:SomeWord  +filename:SomePath
 
 Correct?
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: index and search question

2004-08-09 Thread Aviran
yes

-Original Message-
From: Dmitrii PapaGeorgio [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 16, 2004 9:23 AM
To: [EMAIL PROTECTED]
Subject: index and search question


Ok so when I index a file such as below

Document doc = new Document();
doc.Add(Field.Text(contents, new StreamReader(dataDir)));
doc.Add(Field.Keyword(filename, dataDir));

I can do a search as this
+contents:SomeWord  +filename:SomePath

Correct?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



index and search question

2004-08-08 Thread Dmitrii PapaGeorgio
Ok so when I index a file such as below
Document doc = new Document();
doc.Add(Field.Text(contents, new StreamReader(dataDir)));
doc.Add(Field.Keyword(filename, dataDir));
I can do a search as this
+contents:SomeWord  +filename:SomePath
Correct?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Index and Search question in Lucene.

2004-08-07 Thread Dmitrii PapaGeorgio
Ok so when I index a file such as below
Document doc = new Document();
doc.Add(Field.Text(contents, new StreamReader(dataDir)));
doc.Add(Field.Keyword(filename, dataDir));
I can do a search as this
+contents:SomeWord  +filename:SomePath
Correct?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


index and search question

2004-06-20 Thread Dmitrii PapaGeorgio
Let's say I index documents using this
 Document doc = new Document();
 doc.add(Field.Text(file1, (Reader) new InputStreamReader(is)));
 doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2)));
And want to do a search like this
file1:Word file2:Word2
Basically doing a search using mutiple segments, file1 and file2 in the 
same query, how would this be possible?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: index and search question

2004-06-20 Thread Incze Lajos
On Sun, Jun 20, 2004 at 09:46:42AM +, Dmitrii PapaGeorgio wrote:
 Let's say I index documents using this
 
  Document doc = new Document();
  doc.add(Field.Text(file1, (Reader) new InputStreamReader(is)));
  doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2)));
 
 And want to do a search like this
 
 file1:Word file2:Word2
 
 Basically doing a search using mutiple segments, file1 and file2 in the 
 same query, how would this be possible?

Just as you wrote. If you use the QueryParser, you can search with

file1:Word file2:Word2  or e.g.
+file1:Word +file2:Word2etc.

Or you can build a boolean query programmatically (if I understood
your question).

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Question - not returning desired results

2003-11-26 Thread Pleasant, Tracy
Thanks this helps a lot :)

 



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 4:58 AM
To: Lucene Users List
Subject: Re: Search Question - not returning desired results


On Tuesday, November 25, 2003, at 12:11  PM, Pleasant, Tracy wrote:

 The documents I have index contain information regarding file names 
 also.

 For instance 'return_results.pl' or something like that may be in the 
 document fields.

 I am not understanding Lucene's way of searching:

 1. If I search for 'return_results', the search does not return 
 anything
 2. If I search for 'results' or 'return', the search does not return 
 anything
 3. If I search for 'results.pl', the search does return the document 
 containg 'return_results.pl'
 4. If I search for 'results~', the search does return the document 
 containg 'return_results.pl'
 5. If I search for 'return_results~', the search does not return 
 anything

 What is going on?

 I want it to return the document in all of the situations.

 I also don't want to have to use '~' all the time.

We sure do have a recurring theme lately :)  Analysis!

Please refer to my article at java.net:

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Look at the AnalysisDemo code.  Copy it over and try it out on the text 
you're using and the Analyzer you're using.  The bracketed text that 
comes out are the tokens that you can search on.  It is very very 
important to understand this process and to really know what terms come 
out of text you hand it - otherwise it is a mystery why some things can 
be found and some things cannot despite your expectations to the 
contrary.

A follow-up to the Analysis is querying - and QueryParser has it's own 
set of quirks and caveats related to how things are tokenized/analyzed. 
  And, I've got just the follow-up article for you handy...


http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

If you digest both of these articles (analysis one first please) then I 
think a lot of questions that get asked on this list will be implicitly 
answered.  Understanding analysis is key.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Question - not returning desired results

2003-11-26 Thread Pleasant, Tracy
Erik,

I think there may be a typo in the website.

When I run the AnalyzerDemo :

Analzying xyz corporation - [EMAIL PROTECTED]
org.apache.lucene.analysis.standard.StandardAnalyzer:
[xyz] [corporation] [EMAIL PROTECTED] 

Your website says:

org.apache.lucene.analysis.standard.StandardAnalyzer:
[xyz] [corporation] [EMAIL PROTECTED] [com] 

When I run it it keeps the entire email '[EMAIL PROTECTED]
but according to your website it separates the '[EMAIL PROTECTED]' from the
'com'

Is there a difference between the versions of Lucene? I'm using 1.3rc2.

Plus I think what I want is a StandardAnalyzer with a little tweaking.
The simple one was fine until I realized that it doesn't do numbers,
which I need as part of my search since numbers is important for what
I'm doing. The Standard does numbers but I need it to be a little
different of course. Thanks for the site.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 4:58 AM
To: Lucene Users List
Subject: Re: Search Question - not returning desired results


On Tuesday, November 25, 2003, at 12:11  PM, Pleasant, Tracy wrote:

 The documents I have index contain information regarding file names 
 also.

 For instance 'return_results.pl' or something like that may be in the 
 document fields.

 I am not understanding Lucene's way of searching:

 1. If I search for 'return_results', the search does not return 
 anything
 2. If I search for 'results' or 'return', the search does not return 
 anything
 3. If I search for 'results.pl', the search does return the document 
 containg 'return_results.pl'
 4. If I search for 'results~', the search does return the document 
 containg 'return_results.pl'
 5. If I search for 'return_results~', the search does not return 
 anything

 What is going on?

 I want it to return the document in all of the situations.

 I also don't want to have to use '~' all the time.

We sure do have a recurring theme lately :)  Analysis!

Please refer to my article at java.net:

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Look at the AnalysisDemo code.  Copy it over and try it out on the text 
you're using and the Analyzer you're using.  The bracketed text that 
comes out are the tokens that you can search on.  It is very very 
important to understand this process and to really know what terms come 
out of text you hand it - otherwise it is a mystery why some things can 
be found and some things cannot despite your expectations to the 
contrary.

A follow-up to the Analysis is querying - and QueryParser has it's own 
set of quirks and caveats related to how things are tokenized/analyzed. 
  And, I've got just the follow-up article for you handy...


http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

If you digest both of these articles (analysis one first please) then I 
think a lot of questions that get asked on this list will be implicitly 
answered.  Understanding analysis is key.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search Question - not returning desired results

2003-11-26 Thread Erik Hatcher
On Wednesday, November 26, 2003, at 11:33  AM, Pleasant, Tracy wrote:
Your website says:

org.apache.lucene.analysis.standard.StandardAnalyzer:
[xyz] [corporation] [EMAIL PROTECTED] [com]
When I run it it keeps the entire email '[EMAIL PROTECTED]
but according to your website it separates the '[EMAIL PROTECTED]' from the
'com'
Is there a difference between the versions of Lucene? I'm using 1.3rc2.
Yes, I fixed the bug in the StandardTokenizer that caused e-mail 
addresses to get split, but fixed it after the article was written.  
Good eye!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Search Question - not returning desired results

2003-11-25 Thread Pleasant, Tracy

The documents I have index contain information regarding file names also.

For instance 'return_results.pl' or something like that may be in the document fields.

I am not understanding Lucene's way of searching:

1. If I search for 'return_results', the search does not return anything
2. If I search for 'results' or 'return', the search does not return anything
3. If I search for 'results.pl', the search does return the document containg 
'return_results.pl' 
4. If I search for 'results~', the search does return the document containg 
'return_results.pl' 
5. If I search for 'return_results~', the search does not return anything

What is going on? 

I want it to return the document in all of the situations.

I also don't want to have to use '~' all the time.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search Question

2003-11-25 Thread Dror Matalon
No, but if you use the standard analyzer searching red* will return
documents with read_car

On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote:
 
  If I have words within a document like 
  
  red_car
  
  If I search for 'red' would it return documents containing 'red_car'? 
 
  
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Question

2003-11-25 Thread Pleasant, Tracy
 How come if I search for 'red_car*' it returns nothing.

 I am using standard analyzer, too. 

-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 12:22 PM
To: Lucene Users List
Subject: Re: Search Question


No, but if you use the standard analyzer searching red* will return
documents with read_car

On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote:
 
  If I have words within a document like 
  
  red_car
  
  If I search for 'red' would it return documents containing 'red_car'?

 
  
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Question

2003-11-25 Thread Pleasant, Tracy
Also searching 'red_*' returns nothing, also.





-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 12:22 PM
To: Lucene Users List
Subject: Re: Search Question


No, but if you use the standard analyzer searching red* will return
documents with read_car

On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote:
 
  If I have words within a document like 
  
  red_car
  
  If I search for 'red' would it return documents containing 'red_car'?

 
  
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search Question - not returning desired results

2003-11-25 Thread Otis Gospodnetic
You have to look at Analyzers.
Figure out which one you are using and why, and see if you should be
using a different one or even write your own.
Some of the Analyzers break input on certain tokens (e.g. . or _ or
...), which sounds like the problem is here.

I think Erik's java.net article about Lucene may explain some of these
things.
You could also look at Lucene's unit tests to understand Analyzers
better.

Otis


--- Pleasant, Tracy [EMAIL PROTECTED] wrote:
 
 The documents I have index contain information regarding file names
 also.
 
 For instance 'return_results.pl' or something like that may be in the
 document fields.
 
 I am not understanding Lucene's way of searching:
 
 1. If I search for 'return_results', the search does not return
 anything
 2. If I search for 'results' or 'return', the search does not return
 anything
 3. If I search for 'results.pl', the search does return the document
 containg 'return_results.pl' 
 4. If I search for 'results~', the search does return the document
 containg 'return_results.pl' 
 5. If I search for 'return_results~', the search does not return
 anything
 
 What is going on? 
 
 I want it to return the document in all of the situations.
 
 I also don't want to have to use '~' all the time.
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Question

2003-11-25 Thread Otis Gospodnetic
Because '_' wa sprobably removed from your input before it was indexed.

I suggest reading up on Analyzers and Tokenizers.

Otis

--- Pleasant, Tracy [EMAIL PROTECTED] wrote:
 Also searching 'red_*' returns nothing, also.
 
 
 
 
 
 -Original Message-
 From: Dror Matalon [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, November 25, 2003 12:22 PM
 To: Lucene Users List
 Subject: Re: Search Question
 
 
 No, but if you use the standard analyzer searching red* will return
 documents with read_car
 
 On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote:
  
   If I have words within a document like 
   
   red_car
   
   If I search for 'red' would it return documents containing
 'red_car'?
 
  
   
  
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
  
 
 -- 
 Dror Matalon
 Zapatec Inc 
 1700 MLK Way
 Berkeley, CA 94709
 http://www.fastbuzz.com
 http://www.zapatec.com
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search Question

2003-11-25 Thread Dror Matalon
On Tue, Nov 25, 2003 at 12:30:50PM -0500, Pleasant, Tracy wrote:
  How come if I search for 'red_car*' it returns nothing.

Looks like lucene interprets '_' as a stop word. So 'red_car' is
actually red car with the double quotes. I'm not sure why it does
that.

 
  I am using standard analyzer, too. 
 
 -Original Message-
 From: Dror Matalon [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, November 25, 2003 12:22 PM
 To: Lucene Users List
 Subject: Re: Search Question
 
 
 No, but if you use the standard analyzer searching red* will return
 documents with read_car

I meant red_car in here.

 
 On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote:
  
   If I have words within a document like 
   
   red_car
   
   If I search for 'red' would it return documents containing 'red_car'?
 
  
   
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
 
 -- 
 Dror Matalon
 Zapatec Inc 
 1700 MLK Way
 Berkeley, CA 94709
 http://www.fastbuzz.com
 http://www.zapatec.com
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



keyword search question

2002-10-18 Thread Richard Gregor
Hi,

is it possible to use meta tags in HTML pages for keyworded search using 
Lucene? That means I would like to search in documents not using 
full-text search but a I would like to search ccording to keywords 
specified in pages.

Eaxample of meta tag:
meta name=keywords content=create WWW, WWW for you, web design studio, WWW presentation 

Thanks,
R.
--
Sun Microsystems Czech s.r.o.   
Evropská 33E   
160 00 Praha 6 - Dejvice

Tel.:   +420-2-3300-9246
Fax.:   +420-2-3300-9299
mail.:  [EMAIL PROTECTED]



Search question

2002-04-17 Thread Aruna Raghavan

Hi,
I am looking for ways to cancel a search in response to a cancel from a user
interface. I don't see any thing like a timeout on the Searcher.search()
method. Is there a way to terminate a search request?
Aruna Raghavan
Senior Software Engineer
OPIN Systems SPC

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Search question

2002-04-17 Thread Ype Kingma

Aruna,

Hi,
I am looking for ways to cancel a search in response to a cancel from a user
interface. I don't see any thing like a timeout on the Searcher.search()
method. Is there a way to terminate a search request?

You can use the low level search api with a collector that checks for
cancelling and throw an appropriate error when it occurs.
In case the cancel is detected by another thread you could
make it interrupt the thread running the collector.

However, since searching is quite fast I found no need to interrupt search().
I check for user cancel during retrieval of search results
and also just before starting the query in the next database.

Regards,
Ype

-- 

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]