Re: Let me get started
Clemens, Thanks for the messages. Yes I wanted to index .jsp files also. Is it possible? I thought we need a database to store some values and then retrive them back. Dont we need database for it? Thanks Uma - Original Message - From: Clemens Marschner [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 13, 2002 4:07 PM Subject: Re: Let me get started Now where should I place these jar files? In the /lib dir? Yes. I am having MS SQL Server, Will this help me get my work easy?. MS SQL Server and Lucene are two completely different things. It's like talking about apples and pears. I need to search for .jsp files and .html files. Is this possible? You want to index .jsp files Or do you mean the data that comes from your database? Regarding HTML files: yes you can index anything you want with Lucene, but you have to do some manual steps. See http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexi ngtoc=faq#q11 for a pointer. Is there any website running with Lucene? Please guide me. See http://jakarta.apache.org/lucene/docs/powered.html which by the way is a page you should have read before posting ;-) --Clemens -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: Multiple field searches using AND and OR's
Rob, I believe MultiFieldQueryParser will do the job for you... Regards, Kelvin On Wed, 13 Nov 2002 08:58:36 -0500, Rob Outar said: Hello all, I am wondering how I would do multiple field searches of the form: field1 = value and field2 = value2 or field2 = value3 I am thinking that each one of the above would be a term query but how would I string them together with AND's and OR's? Any help would be appreciated. Thanks, Rob PS I found this in the FAQ, but I was wondering if there was any other way to do it: My documents have multiple fields, do I have to replicate a query for each of them ? Not necessarily. A simple solution is to index the documents using a general field that contains a concatenation of the content of all the searchable fields ('author', 'title', 'body' etc). This way, a simple query will search in entire document content. The disadvantage of this method is that you cannot boost certain fields relative to others. Note also the matches in longer documents results in lower ranking. -- To unsubscribe, e-mail: mailto:lucene-user- [EMAIL PROTECTED] For additional commands, e-mail: mailto:lucene-user- [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Indexing files
Hello, This is what I see in the docs for indexing the files, Once you've gotten this far you're probably itching to go. Let's build an index! Assuming you've set your classpath correctly, just type java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src. This will produce a subdirectory called index which will contain an index of all of the Lucene sourcecode. How do I type this command (java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src) if the files are in the server? I have copied lucene-1.2.jar and lucene-demos-1.2.jar to web-inf/lib directory. Please advice me what is the next step. Thanks Uma http://www.javagalaxy.com -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
the order of fields in Document.fields()
Quick question about Document.fields(). Lucene provides you with a method to retrieve the value of a field or grab all fields as an Enumeration. It does not, however, allow you to grab all values of one field for a document, it will only return the last value added for that field. For example, I am indexing email messages that might have multiple To/CC/BCC fields in the message header. Currently to grab all the values when I display an email that has been indexed, I must use the fields() method to grab an Enumeration of all fields in a document. I then separate them into different arrays based on the field names. However I am concerned about the order of the fields since I consider the first To or CC or BCC to be the main value for each field. Is the order of the fields returned in the order that they are added? Or is there no order? If there is no order, can someone suggest a solution? Thanks! Roy. This email and any attachments are confidential and may be legally privileged. No confidentiality or privilege is waived or lost by any transmission in error. If you are not the intended recipient you are hereby notified that any use, printing, copying or disclosure is strictly prohibited. Please delete this email and any attachments, without printing, copying, forwarding or saving them and notify the sender immediately by reply e-mail. Zurich Capital Markets and its affiliates reserve the right to monitor all e-mail communications through its networks. Unless otherwise stated, any pricing information in this e-mail is indicative only, is subject to change and does not constitute an offer to enter into any transaction at such price and any terms in relation to any proposed transaction are indicative only and subject to express final confirmation.
RE: Multiple field searches using AND and OR's
Looked at that already the format is this: public static Query parse(String query, String[] fields, Analyzer analyzer) throws ParseExceptionParses a query which searches on the fields specified. If x fields are specified, this effectively constructs: (field1:query) (field2:query) (field3:query)...(fieldx:query) my query value will not be the same. This lets u query multiple field with the same query, my query string will be different f_name = rob and l_name = outar or address = some value stuff like that. Plus there is no way of specifying OR and AND's. Thanks, Rob O -Original Message- From: Kelvin Tan [mailto:kelvin-lists;relevanz.com] Sent: Wednesday, November 13, 2002 9:42 AM To: Lucene Users List Subject: Re: Multiple field searches using AND and OR's Rob, I believe MultiFieldQueryParser will do the job for you... Regards, Kelvin On Wed, 13 Nov 2002 08:58:36 -0500, Rob Outar said: Hello all, I am wondering how I would do multiple field searches of the form: field1 = value and field2 = value2 or field2 = value3 I am thinking that each one of the above would be a term query but how would I string them together with AND's and OR's? Any help would be appreciated. Thanks, Rob PS I found this in the FAQ, but I was wondering if there was any other way to do it: My documents have multiple fields, do I have to replicate a query for each of them ? Not necessarily. A simple solution is to index the documents using a general field that contains a concatenation of the content of all the searchable fields ('author', 'title', 'body' etc). This way, a simple query will search in entire document content. The disadvantage of this method is that you cannot boost certain fields relative to others. Note also the matches in longer documents results in lower ranking. -- To unsubscribe, e-mail: mailto:lucene-user- [EMAIL PROTECTED] For additional commands, e-mail: mailto:lucene-user- [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: the order of fields in Document.fields()
The order is preserved (or reversed actually), so it's not random. It's reverse of the order of the order in which the fields were added to the document. This would be easy to test... Otis --- [EMAIL PROTECTED] wrote: Quick question about Document.fields(). Lucene provides you with a method to retrieve the value of a field or grab all fields as an Enumeration. It does not, however, allow you to grab all values of one field for a document, it will only return the last value added for that field. For example, I am indexing email messages that might have multiple To/CC/BCC fields in the message header. Currently to grab all the values when I display an email that has been indexed, I must use the fields() method to grab an Enumeration of all fields in a document. I then separate them into different arrays based on the field names. However I am concerned about the order of the fields since I consider the first To or CC or BCC to be the main value for each field. Is the order of the fields returned in the order that they are added? Or is there no order? If there is no order, can someone suggest a solution? Thanks! Roy. This email and any attachments are confidential and may be legally privileged. No confidentiality or privilege is waived or lost by any transmission in error. If you are not the intended recipient you are hereby notified that any use, printing, copying or disclosure is strictly prohibited. Please delete this email and any attachments, without printing, copying, forwarding or saving them and notify the sender immediately by reply e-mail. Zurich Capital Markets and its affiliates reserve the right to monitor all e-mail communications through its networks. Unless otherwise stated, any pricing information in this e-mail is indicative only, is subject to change and does not constitute an offer to enter into any transaction at such price and any terms in relation to any proposed transaction are indicative only and subject to express final confirmation. __ Do you Yahoo!? U2 on LAUNCH - Exclusive greatest hits videos http://launch.yahoo.com/u2 -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Multiple field searches using AND and OR's
The QueryParser will handle input such as field1:value1 AND field2:value2 OR field3:value3, and will construct the appropriate term and boolean queries. See Query Syntax page. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Looked at that already the format is this: public static Query parse(String query, String[] fields, Analyzer analyzer) throws ParseExceptionParses a query which searches on the fields specified. If x fields are specified, this effectively constructs: (field1:query) (field2:query) (field3:query)...(fieldx:query) my query value will not be the same. This lets u query multiple field with the same query, my query string will be different f_name = rob and l_name = outar or address = some value stuff like that. Plus there is no way of specifying OR and AND's. Thanks, Rob O -Original Message- From: Kelvin Tan [mailto:kelvin-lists;relevanz.com] Sent: Wednesday, November 13, 2002 9:42 AM To: Lucene Users List Subject: Re: Multiple field searches using AND and OR's Rob, I believe MultiFieldQueryParser will do the job for you... Regards, Kelvin On Wed, 13 Nov 2002 08:58:36 -0500, Rob Outar said: Hello all, I am wondering how I would do multiple field searches of the form: field1 = value and field2 = value2 or field2 = value3 I am thinking that each one of the above would be a term query but how would I string them together with AND's and OR's? Any help would be appreciated. Thanks, Rob PS I found this in the FAQ, but I was wondering if there was any other way to do it: My documents have multiple fields, do I have to replicate a query for each of them ? Not necessarily. A simple solution is to index the documents using a general field that contains a concatenation of the content of all the searchable fields ('author', 'title', 'body' etc). This way, a simple query will search in entire document content. The disadvantage of this method is that you cannot boost certain fields relative to others. Note also the matches in longer documents results in lower ranking. -- To unsubscribe, e-mail: mailto:lucene-user- [EMAIL PROTECTED] For additional commands, e-mail: mailto:lucene-user- [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org __ Do you Yahoo!? U2 on LAUNCH - Exclusive greatest hits videos http://launch.yahoo.com/u2 -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: Let me get started
Hello, I have copied the jar files to the /lib directory of my web server. Can you tell me what should I do next? or in short can you guide me in points? Thanks Uma -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: Let me get started
On Wed, Nov 13, 2002 at 05:14:25PM +0530, Uma Maheswar wrote: Thanks for the messages. Yes I wanted to index .jsp files also. Is it possible? It's possible, but you'll need to write code to select and parse the jsp files. There may be code in the sandbox area at jakarta.apache.org/lucene for doing this, though I don't see it. I thought we need a database to store some values and then retrive them back. Dont we need database for it? Nope, lucene stores search data in its own files. You can easily use lucene to build a search engine for data that's stored in a database, but you don't need a database. Steven J. Owens [EMAIL PROTECTED] I'm going to make broad, sweeping generalizations and strong, declarative statements, because otherwise I'll be here all night and this document will be four times longer and much less fun to read. Take it all with a grain of salt. - Me at http://darksleep.com -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: Let me get started
Thanks Steve, Can you help me get started? I have downloaded lucene-1.2.jar and lucene-demos-1.2.jar and I have placed them in web-inf/lib directory of my web server (http://www.javagalaxy.com). I do not know what to do next. I read the documents that came with .zip file. But they are all for localhost:8080, I need lucene for my website. Could any one guide me with steps to successfully make lucene work in my website? Thanks Uma -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: Problems with exact matces on non-tokenized fields...
I came accross the same problem and I think that the faq entry you (Otis) propose should get a better title so that users can find more easily an answer to this problem. Correct me if I'm wrong (and please forgive any wrong assumptions I may have made), put the problem is on how to query on a non tokenized field? Problem explanation: If a field is not tokenized than it is not passed through the analyzer, independently of the used analyzer (that's what I understand by looking into DocumentWriter.invertDocument()). If you construct a query with a given analyzer (for example with QueryParser.parse(query, field, analyzer)) with this field, the queryparser does not know that this field is not tokenized and passes it through the analyzer. Ther analyzer may alter the query (for example if the analyzer has a stemming algorithm) and the document is not matched uppon the query. The solution: The solution is to make sure that fields that aren't tokenized during indexig, are not passed through the analyzer during searching. This can be done in 2 ways, either by making an analyzer that takes care of this according to the field, or by constructing a TermQuery with this field and adding it to the rest of the query Example: put here the 2 examples from Doug Stefanos Otis Gospodnetic wrote: Thanks, it's a FAQ entry now: How do I write my own Analyzer? http://www.jguru.com/faq/view.jsp?EID=1006122 Otis --- Doug Cutting [EMAIL PROTECTED] wrote: karl øie wrote: I have a Lucene Document with a field named element which is stored and indexed but not tokenized. The value of the field is POST (uppercase). But the only way i can match the field is by entering element:POST? or element:POST* in the QueryParser class. There are two ways to do this. If this must be entered by users in the query string, then you need to use a non-lowercasing analyzer for this field. The way to do this if you're currently using StandardAnalyzer, is to do something like: public class MyAnalyzer extends Analyzer { private Analyzer standard = new StandardAnalyzer(); public TokenStream tokenStream(String field, final Reader reader) { if (element.equals(field)) {// don't tokenize return new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; } else { // use standard analyzer return standard.tokenStream(field, reader); } } } Analyzer analyzer = new MyAnalyzer(); Query query = queryParser.parse(... +element:POST, analyzer); Alternately, if this query field is added by a program, then this can be done by bypassing the analyzer for this class, building this clause directly instead: Analyzer analyzer = new StandardAnalyzer(); BooleanQuery query = (BooleanQuery)queryParser.parse(..., analyzer); // now add the element clause query.add(new TermQuery(new Term(element, POST))), true, false); Perhaps this should become an FAQ... Doug -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org __ Do you Yahoo!? New DSL Internet Access from SBC Yahoo! http://sbc.yahoo.com -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: the order of fields in Document.fields()
Shouldn't there be at least one method that returns an array of fields in the correct order? Roy. -Original Message- The order is preserved (or reversed actually), so it's not random. It's reverse of the order of the order in which the fields were added to the document. This would be easy to test... This email and any attachments are confidential and may be legally privileged. No confidentiality or privilege is waived or lost by any transmission in error. If you are not the intended recipient you are hereby notified that any use, printing, copying or disclosure is strictly prohibited. Please delete this email and any attachments, without printing, copying, forwarding or saving them and notify the sender immediately by reply e-mail. Zurich Capital Markets and its affiliates reserve the right to monitor all e-mail communications through its networks. Unless otherwise stated, any pricing information in this e-mail is indicative only, is subject to change and does not constitute an offer to enter into any transaction at such price and any terms in relation to any proposed transaction are indicative only and subject to express final confirmation.
BooleanQuery question
Hi, Suppose I want to match documents where fieldX is equal to A OR B. Is the following correct? BooleanQuery bq = new BooleanQuery(); Term a = new Term(fieldX,A); Term b = new Term(fieldX,B); TermQuery tqA = new TermQuery(a); TermQuery tqB = new TermQuery(b); bq.add(tqA,false,false); bq.add(tqB,false,false); Then the code searches on bq Does this do what I want? I can't get it to work.
Re: BooleanQuery question
Maybe A and B are getting eliminated by your Analyzer? a and b are in the list of stop words, no? And A and B are lowercased. Or is this just an example? Try bq.toString() Try adding just one Query to it etc. etc. Otis --- aaz [EMAIL PROTECTED] wrote: Hi, Suppose I want to match documents where fieldX is equal to A OR B. Is the following correct? BooleanQuery bq = new BooleanQuery(); Term a = new Term(fieldX,A); Term b = new Term(fieldX,B); TermQuery tqA = new TermQuery(a); TermQuery tqB = new TermQuery(b); bq.add(tqA,false,false); bq.add(tqB,false,false); Then the code searches on bq Does this do what I want? I can't get it to work. __ Do you Yahoo!? U2 on LAUNCH - Exclusive greatest hits videos http://launch.yahoo.com/u2 -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: BooleanQuery question
this is just an example, but I figured it out. Stemming/lower casing problem . du! - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 13, 2002 4:54 PM Subject: Re: BooleanQuery question Maybe A and B are getting eliminated by your Analyzer? a and b are in the list of stop words, no? And A and B are lowercased. Or is this just an example? Try bq.toString() Try adding just one Query to it etc. etc. Otis --- aaz [EMAIL PROTECTED] wrote: Hi, Suppose I want to match documents where fieldX is equal to A OR B. Is the following correct? BooleanQuery bq = new BooleanQuery(); Term a = new Term(fieldX,A); Term b = new Term(fieldX,B); TermQuery tqA = new TermQuery(a); TermQuery tqB = new TermQuery(b); bq.add(tqA,false,false); bq.add(tqB,false,false); Then the code searches on bq Does this do what I want? I can't get it to work. __ Do you Yahoo!? U2 on LAUNCH - Exclusive greatest hits videos http://launch.yahoo.com/u2 -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: Change in Range Query Syntax?
I was surprised by this change too. I think the syntax changed from [from - to] to [from to]. - Original Message - From: Terry Steichen [EMAIL PROTECTED] To: Lucene Users Group [EMAIL PROTECTED] Sent: Thursday, November 14, 2002 12:18 AM Subject: Change in Range Query Syntax? I recently upgraded (from 1.2) to the latest build (1.3.1) and found that my range queries no longer work. Here's what a simple query against my index yields: pub_date:20021109 yields 133 hits pub_date:20021110 yields 225 hits pub_date:2002 yields 144 hits With 1.2RC5 and 1.2, here's how the range query works: pub_date:[20021109 - 2002] yields 502 hits (note space on both sides of dash) With 1.3 (nightly build as of 11/11/02), here's how the range query now works: pub_date:[20021109 - 2002] yields 0 hits (note space on both sides of dash) pub_date:[20021109- 2002] yields 369 hits (note space only following the dash) pub_date:[20021109-2002] yields 0 hits (note no spaces on either side of dash) Also, note that pub_date:]20021109- 20021110] does *not* include the hits for 20021109 as it did previously. The errors (ParseExceptions) generated were these: Was expecting one of: TO ... RANGEIN_QUOTED ... RANGEIN_GOOP ... , Encountered ] at line 1, column 27. Was expecting one of: TO ... RANGEIN_QUOTED ... RANGEIN_GOOP ... Has the syntax changed, or is this a bug? Regards, Terry -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: Can any one help me?
Uma, I think the problem is that in order for one to help you get started with Lucene, one would have to also help you with servlet containers, etc. If you learn how to write a simple servlet, and how to deploy it into your servlet container, then you will know how to deploy something that uses Lucene, too. Do you know how to write a simple servlet and how to deploy it? Otis --- Uma Maheswar [EMAIL PROTECTED] wrote: Hello, I am disappointed for not getting any reply evern after 4 posts. Is there any one who can help a beginner in Lucene? Thanks Uma http://www.javagalaxy.com __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Re: Can any one help me?
Otis, Yes, I know Servlets and JSP. I am the only developer working on http://www.javagalaxy.com. All the contents in the site are developed by me. But I am not sure of working with Lucene. Can you help me? Uma -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org