Re: search question
Erik, They both use the StandardAnalyzer... however looking at the toString() makes everything clearer. In the case a string has the following email address: [EMAIL PROTECTED], it gets split like so: first.last domain.com However in 1.4 it does not get split. So now we just check to see if an index was built using 1.2 or 1.4 and have some checks thrown in. Thanks for the guidance. Roy. On Wed, 22 Dec 2004 18:41:44 -0500, Erik Hatcher wrote What does toString() return for each of those queries? Are you using the same analyzer in both cases? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
search question
Hi guys, We have an index with some fields containing email addresses. Doing a search for an email address with this format: [EMAIL PROTECTED], does not bring up any results with lucene 1.4. The query: Field1:[EMAIL PROTECTED] However it returns results with 1.2. Any ideas? Roy. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: search question
What does toString() return for each of those queries? Are you using the same analyzer in both cases? Erik On Dec 22, 2004, at 5:44 PM, [EMAIL PROTECTED] wrote: Hi guys, We have an index with some fields containing email addresses. Doing a search for an email address with this format: [EMAIL PROTECTED], does not bring up any results with lucene 1.4. The query: Field1:[EMAIL PROTECTED] However it returns results with 1.2. Any ideas? Roy. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Index and Search question in Lucene.
Hi Dimitri What analyzer you use? You need take carefully with Keyword fields and analyzers. When you index a Document, the fields that have set tokenized = false, like Keyword, are not analyzed. In search time you need parse the query with your analyzer but not analyze the untokenized fields, like your filename. I can do a search as this +contents:SomeWord +filename:SomePath The sintaxis is rigth, but if you search +filename:somepath, find only this file. For example, +content:version +filename:/my/path/myfile.ext Only can found myfile.ext, and if this file don't content version, not going to find nothing. This is because you use +. + set the term required. You can see the queries sintaxis in lucene site. http://jakarta.apache.org/lucene/docs/queryparsersyntax.html http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.searchtoc=faq#q5 good luck. Bye Ernesto. El dom, 15 de 08 de 2004 a las 17:13, Dmitrii PapaGeorgio escribi: Ok so when I index a file such as below Document doc = new Document(); doc.Add(Field.Text(contents, new StreamReader(dataDir))); doc.Add(Field.Keyword(filename, dataDir)); I can do a search as this +contents:SomeWord +filename:SomePath Correct? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: index and search question
yes -Original Message- From: Dmitrii PapaGeorgio [mailto:[EMAIL PROTECTED] Sent: Monday, August 16, 2004 9:23 AM To: [EMAIL PROTECTED] Subject: index and search question Ok so when I index a file such as below Document doc = new Document(); doc.Add(Field.Text(contents, new StreamReader(dataDir))); doc.Add(Field.Keyword(filename, dataDir)); I can do a search as this +contents:SomeWord +filename:SomePath Correct? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
index and search question
Ok so when I index a file such as below Document doc = new Document(); doc.Add(Field.Text(contents, new StreamReader(dataDir))); doc.Add(Field.Keyword(filename, dataDir)); I can do a search as this +contents:SomeWord +filename:SomePath Correct? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Index and Search question in Lucene.
Ok so when I index a file such as below Document doc = new Document(); doc.Add(Field.Text(contents, new StreamReader(dataDir))); doc.Add(Field.Keyword(filename, dataDir)); I can do a search as this +contents:SomeWord +filename:SomePath Correct? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
index and search question
Let's say I index documents using this Document doc = new Document(); doc.add(Field.Text(file1, (Reader) new InputStreamReader(is))); doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2))); And want to do a search like this file1:Word file2:Word2 Basically doing a search using mutiple segments, file1 and file2 in the same query, how would this be possible? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index and search question
On Sun, Jun 20, 2004 at 09:46:42AM +, Dmitrii PapaGeorgio wrote: Let's say I index documents using this Document doc = new Document(); doc.add(Field.Text(file1, (Reader) new InputStreamReader(is))); doc.add(Field.Text(file2, (Reader) new InputStreamReader(is2))); And want to do a search like this file1:Word file2:Word2 Basically doing a search using mutiple segments, file1 and file2 in the same query, how would this be possible? Just as you wrote. If you use the QueryParser, you can search with file1:Word file2:Word2 or e.g. +file1:Word +file2:Word2etc. Or you can build a boolean query programmatically (if I understood your question). incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question - not returning desired results
Thanks this helps a lot :) -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 4:58 AM To: Lucene Users List Subject: Re: Search Question - not returning desired results On Tuesday, November 25, 2003, at 12:11 PM, Pleasant, Tracy wrote: The documents I have index contain information regarding file names also. For instance 'return_results.pl' or something like that may be in the document fields. I am not understanding Lucene's way of searching: 1. If I search for 'return_results', the search does not return anything 2. If I search for 'results' or 'return', the search does not return anything 3. If I search for 'results.pl', the search does return the document containg 'return_results.pl' 4. If I search for 'results~', the search does return the document containg 'return_results.pl' 5. If I search for 'return_results~', the search does not return anything What is going on? I want it to return the document in all of the situations. I also don't want to have to use '~' all the time. We sure do have a recurring theme lately :) Analysis! Please refer to my article at java.net: http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html Look at the AnalysisDemo code. Copy it over and try it out on the text you're using and the Analyzer you're using. The bracketed text that comes out are the tokens that you can search on. It is very very important to understand this process and to really know what terms come out of text you hand it - otherwise it is a mystery why some things can be found and some things cannot despite your expectations to the contrary. A follow-up to the Analysis is querying - and QueryParser has it's own set of quirks and caveats related to how things are tokenized/analyzed. And, I've got just the follow-up article for you handy... http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html If you digest both of these articles (analysis one first please) then I think a lot of questions that get asked on this list will be implicitly answered. Understanding analysis is key. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question - not returning desired results
Erik, I think there may be a typo in the website. When I run the AnalyzerDemo : Analzying xyz corporation - [EMAIL PROTECTED] org.apache.lucene.analysis.standard.StandardAnalyzer: [xyz] [corporation] [EMAIL PROTECTED] Your website says: org.apache.lucene.analysis.standard.StandardAnalyzer: [xyz] [corporation] [EMAIL PROTECTED] [com] When I run it it keeps the entire email '[EMAIL PROTECTED] but according to your website it separates the '[EMAIL PROTECTED]' from the 'com' Is there a difference between the versions of Lucene? I'm using 1.3rc2. Plus I think what I want is a StandardAnalyzer with a little tweaking. The simple one was fine until I realized that it doesn't do numbers, which I need as part of my search since numbers is important for what I'm doing. The Standard does numbers but I need it to be a little different of course. Thanks for the site. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 4:58 AM To: Lucene Users List Subject: Re: Search Question - not returning desired results On Tuesday, November 25, 2003, at 12:11 PM, Pleasant, Tracy wrote: The documents I have index contain information regarding file names also. For instance 'return_results.pl' or something like that may be in the document fields. I am not understanding Lucene's way of searching: 1. If I search for 'return_results', the search does not return anything 2. If I search for 'results' or 'return', the search does not return anything 3. If I search for 'results.pl', the search does return the document containg 'return_results.pl' 4. If I search for 'results~', the search does return the document containg 'return_results.pl' 5. If I search for 'return_results~', the search does not return anything What is going on? I want it to return the document in all of the situations. I also don't want to have to use '~' all the time. We sure do have a recurring theme lately :) Analysis! Please refer to my article at java.net: http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html Look at the AnalysisDemo code. Copy it over and try it out on the text you're using and the Analyzer you're using. The bracketed text that comes out are the tokens that you can search on. It is very very important to understand this process and to really know what terms come out of text you hand it - otherwise it is a mystery why some things can be found and some things cannot despite your expectations to the contrary. A follow-up to the Analysis is querying - and QueryParser has it's own set of quirks and caveats related to how things are tokenized/analyzed. And, I've got just the follow-up article for you handy... http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html If you digest both of these articles (analysis one first please) then I think a lot of questions that get asked on this list will be implicitly answered. Understanding analysis is key. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search Question - not returning desired results
On Wednesday, November 26, 2003, at 11:33 AM, Pleasant, Tracy wrote: Your website says: org.apache.lucene.analysis.standard.StandardAnalyzer: [xyz] [corporation] [EMAIL PROTECTED] [com] When I run it it keeps the entire email '[EMAIL PROTECTED] but according to your website it separates the '[EMAIL PROTECTED]' from the 'com' Is there a difference between the versions of Lucene? I'm using 1.3rc2. Yes, I fixed the bug in the StandardTokenizer that caused e-mail addresses to get split, but fixed it after the article was written. Good eye! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search Question - not returning desired results
The documents I have index contain information regarding file names also. For instance 'return_results.pl' or something like that may be in the document fields. I am not understanding Lucene's way of searching: 1. If I search for 'return_results', the search does not return anything 2. If I search for 'results' or 'return', the search does not return anything 3. If I search for 'results.pl', the search does return the document containg 'return_results.pl' 4. If I search for 'results~', the search does return the document containg 'return_results.pl' 5. If I search for 'return_results~', the search does not return anything What is going on? I want it to return the document in all of the situations. I also don't want to have to use '~' all the time. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search Question
No, but if you use the standard analyzer searching red* will return documents with read_car On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote: If I have words within a document like red_car If I search for 'red' would it return documents containing 'red_car'? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question
How come if I search for 'red_car*' it returns nothing. I am using standard analyzer, too. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 12:22 PM To: Lucene Users List Subject: Re: Search Question No, but if you use the standard analyzer searching red* will return documents with read_car On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote: If I have words within a document like red_car If I search for 'red' would it return documents containing 'red_car'? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question
Also searching 'red_*' returns nothing, also. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 12:22 PM To: Lucene Users List Subject: Re: Search Question No, but if you use the standard analyzer searching red* will return documents with read_car On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote: If I have words within a document like red_car If I search for 'red' would it return documents containing 'red_car'? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search Question - not returning desired results
You have to look at Analyzers. Figure out which one you are using and why, and see if you should be using a different one or even write your own. Some of the Analyzers break input on certain tokens (e.g. . or _ or ...), which sounds like the problem is here. I think Erik's java.net article about Lucene may explain some of these things. You could also look at Lucene's unit tests to understand Analyzers better. Otis --- Pleasant, Tracy [EMAIL PROTECTED] wrote: The documents I have index contain information regarding file names also. For instance 'return_results.pl' or something like that may be in the document fields. I am not understanding Lucene's way of searching: 1. If I search for 'return_results', the search does not return anything 2. If I search for 'results' or 'return', the search does not return anything 3. If I search for 'results.pl', the search does return the document containg 'return_results.pl' 4. If I search for 'results~', the search does return the document containg 'return_results.pl' 5. If I search for 'return_results~', the search does not return anything What is going on? I want it to return the document in all of the situations. I also don't want to have to use '~' all the time. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Question
Because '_' wa sprobably removed from your input before it was indexed. I suggest reading up on Analyzers and Tokenizers. Otis --- Pleasant, Tracy [EMAIL PROTECTED] wrote: Also searching 'red_*' returns nothing, also. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 12:22 PM To: Lucene Users List Subject: Re: Search Question No, but if you use the standard analyzer searching red* will return documents with read_car On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote: If I have words within a document like red_car If I search for 'red' would it return documents containing 'red_car'? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search Question
On Tue, Nov 25, 2003 at 12:30:50PM -0500, Pleasant, Tracy wrote: How come if I search for 'red_car*' it returns nothing. Looks like lucene interprets '_' as a stop word. So 'red_car' is actually red car with the double quotes. I'm not sure why it does that. I am using standard analyzer, too. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 12:22 PM To: Lucene Users List Subject: Re: Search Question No, but if you use the standard analyzer searching red* will return documents with read_car I meant red_car in here. On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote: If I have words within a document like red_car If I search for 'red' would it return documents containing 'red_car'? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
keyword search question
Hi, is it possible to use meta tags in HTML pages for keyworded search using Lucene? That means I would like to search in documents not using full-text search but a I would like to search ccording to keywords specified in pages. Eaxample of meta tag: meta name=keywords content=create WWW, WWW for you, web design studio, WWW presentation Thanks, R. -- Sun Microsystems Czech s.r.o. Evropská 33E 160 00 Praha 6 - Dejvice Tel.: +420-2-3300-9246 Fax.: +420-2-3300-9299 mail.: [EMAIL PROTECTED]
Search question
Hi, I am looking for ways to cancel a search in response to a cancel from a user interface. I don't see any thing like a timeout on the Searcher.search() method. Is there a way to terminate a search request? Aruna Raghavan Senior Software Engineer OPIN Systems SPC -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Search question
Aruna, Hi, I am looking for ways to cancel a search in response to a cancel from a user interface. I don't see any thing like a timeout on the Searcher.search() method. Is there a way to terminate a search request? You can use the low level search api with a collector that checks for cancelling and throw an appropriate error when it occurs. In case the cancel is detected by another thread you could make it interrupt the thread running the collector. However, since searching is quite fast I found no need to interrupt search(). I check for user cancel during retrieval of search results and also just before starting the query in the next database. Regards, Ype -- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]