field boost factor
Hi all, Is it possible to set different boost factor to different fields when you do a search, rather than when you index? Thanks, Anson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: field boost factor
I think I found it in Query API... Thanks, Anson -Original Message- From: Anson Lau [mailto:[EMAIL PROTECTED] Sent: Friday, May 14, 2004 4:27 PM To: [EMAIL PROTECTED] Subject: field boost factor Hi all, Is it possible to set different boost factor to different fields when you do a search, rather than when you index? Thanks, Anson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
BIG Problem
Hi Lucene users, DO YOU AGREE TO CHANGE THE DEATH PENALTY FOR THOSE WHO CONTAMINATED THE BLOOD OF LYBEENS CHILDREN WITH AIDS? BTW: It is NOT the right question: Bulgarian medics DIDN'T contaminate the blood. 3 diferent independent commisions, with strong professors in aids area, told that THE INFECTION IS STARTED 1 YEAR before Bulgarian medics to come in the Lybian hospital and it is internal hospital problem - very bad hygiene and multiple usage of same injection. Lybians treing to escape his own guilty with bulgarian medics - THAT IS NOT JUSTICE. Please people don't trust them - see the facts! Lybians have one confession received from one of the 5 bulgarian nurses, after 2 MOUNTS beating, thrashing, whopping, wresting nails, and a lot other tortures in the Lybian jail - THAT IS THE CRIME and it can be THE ONLY ARGUMENT to EXECUTE 5 INNOCENT NURSES! FREE THE BULGARIAN MEDICS NOW! http://www.aljazeera.net/ First Radion Button: CHANGE IT! Second Radion Button:KILL THEM ALL Please re-send it to all your contacts! - http://sport.netinfo.bg/ - ... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Are lucene have a configuration feature for storage compression option?
Moving to lucene-user list. Hello, Didn't I already answer these questions? 1. No :( 2. Use POI (jakarta.apache.org/poi) API 3. IndexReader can provide at least some of your numbers. I suggest you look at Javadocs for IndexReader, which are available on Lucene's site. Otis --- Alex Aw Seat Kiong [EMAIL PROTECTED] wrote: Hi! Some question about lucene: 1. Are lucene have a configuration feature for storage compression option? 2. Any purse java code for Excel and Powerpoint parser can be use to support lucene index Excel and Powerpoint documents? 3. How to get the information as below. Any API for it? - total number index/document was indexed. - total index size per storage was indexed. - last index updated date was indexed. - total number index/document was deleted. Thanks. Regards, AlexAw - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
NewBie with Lucene 1.4RC : issue on Demo3 HTML with result.jsp.java import error ???
Hello, Could somebody help? I am trying to discover and use Lucene to search within HTML pages within Apache/Tomcat server. I have found Lucene, and tried the demos... With issue running Demo3 and indexHTML, at search.JSP. I have modified CLASSPATH, created the index directory, the index, modified indexLocation into Configuration.jsp, started sucessfully search.jsp... But it always come to an HTTP 500 with these context : Etat HTTP 500 - type Rapport d''exception message description Le serveur a recontrer une erreur interne () qui l'a empèché de satisfaire la requête. exception v org.apache.jasper.JasperException: Unable to compile class for JSP v An error occurred at line: 18 in the jsp file: /results.jsp v v Generated servlet error: [javac] Compiling 1 source file v v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:10: package org.apache.lucene.analysis does not exist v import org.apache.lucene.analysis.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:11: package org.apache.lucene.document does not exist v import org.apache.lucene.document.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:12: package org.apache.lucene.index does not exist v import org.apache.lucene.index.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:13: package org.apache.lucene.search does not exist v import org.apache.lucene.search.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:14: package org.apache.lucene.queryParser does not exist v import org.apache.lucene.queryParser.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:15: package org.apache.lucene.demo does not exist v import org.apache.lucene.demo.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:16: package org.apache.lucene.demo.html does not exist v import org.apache.lucene.demo.html.Entities; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:101: cannot resolve symbol v symbol : class IndexSearcher v location: class org.apache.jsp.results_jsp v IndexSearcher searcher = null; //the searcher used to open/search the index v ^ v An error occurred at line: 18 in the jsp file: /results.jsp v v Generated servlet error: v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:102: cannot resolve symbol v symbol : class Query v location: class org.apache.jsp.results_jsp v Query query = null; //the Query created by the QueryParser v ^ v ^ v 17 errors v v at org.apache.jasper.compiler.DefaultErrorHandler.javacError(DefaultErrorHandle r.java:130) v at org.apache.jasper.compiler.ErrorDispatcher.javacError(ErrorDispatcher.java:2 93) v at org.apache.jasper.compiler.Compiler.generateClass(Compiler.java:353) v at org.apache.jasper.compiler.Compiler.compile(Compiler.java:370) v at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:4 73) v at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:1 90) v at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:295) v at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:241) v at javax.servlet.http.HttpServlet.service(HttpServlet.java:853) v at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:247) v at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:193) v at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:256) v at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) v at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) v at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) v at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:191) v at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) v at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) v at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) v at org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2417) v at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180 ) v at
Re: NewBie with Lucene 1.4RC : issue on Demo3 HTML with result.jsp.java import error ???
I have not used Lucene demo in a looong time, and have not used Tomcat in a few years, but it looks like you need to add a Lucene jar either in your WAR (Lucene demo WAR, I guess), or you can just put Lucene Jar in some lib directory from which Tomcat loads his Jars. Otis --- Bruno Tirel [EMAIL PROTECTED] wrote: Hello, Could somebody help? I am trying to discover and use Lucene to search within HTML pages within Apache/Tomcat server. I have found Lucene, and tried the demos... With issue running Demo3 and indexHTML, at search.JSP. I have modified CLASSPATH, created the index directory, the index, modified indexLocation into Configuration.jsp, started sucessfully search.jsp... But it always come to an HTTP 500 with these context : Etat HTTP 500 - type Rapport d''exception message description Le serveur a recontrer une erreur interne () qui l'a empèché de satisfaire la requête. exception v org.apache.jasper.JasperException: Unable to compile class for JSP v An error occurred at line: 18 in the jsp file: /results.jsp v v Generated servlet error: [javac] Compiling 1 source file v v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:10: package org.apache.lucene.analysis does not exist v import org.apache.lucene.analysis.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:11: package org.apache.lucene.document does not exist v import org.apache.lucene.document.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:12: package org.apache.lucene.index does not exist v import org.apache.lucene.index.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:13: package org.apache.lucene.search does not exist v import org.apache.lucene.search.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:14: package org.apache.lucene.queryParser does not exist v import org.apache.lucene.queryParser.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:15: package org.apache.lucene.demo does not exist v import org.apache.lucene.demo.*; v ^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:16: package org.apache.lucene.demo.html does not exist v import org.apache.lucene.demo.html.Entities; v^ v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:101: cannot resolve symbol v symbol : class IndexSearcher v location: class org.apache.jsp.results_jsp v IndexSearcher searcher = null; //the searcher used to open/search the index v ^ v An error occurred at line: 18 in the jsp file: /results.jsp v v Generated servlet error: v C:\www\tomcat4.1\work\Standalone\localhost\luceneweb\results_jsp.java:102: cannot resolve symbol v symbol : class Query v location: class org.apache.jsp.results_jsp v Query query = null; //the Query created by the QueryParser v ^ v ^ v 17 errors v v at org.apache.jasper.compiler.DefaultErrorHandler.javacError(DefaultErrorHandle r.java:130) v at org.apache.jasper.compiler.ErrorDispatcher.javacError(ErrorDispatcher.java:2 93) v at org.apache.jasper.compiler.Compiler.generateClass(Compiler.java:353) v at org.apache.jasper.compiler.Compiler.compile(Compiler.java:370) v at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:4 73) v at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:1 90) v at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:295) v at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:241) v at javax.servlet.http.HttpServlet.service(HttpServlet.java:853) v at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:247) v at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:193) v at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:256) v at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) v at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) v at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) v at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:191) v at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) v at
Re: BooleanQuery.add()
Thanks, I added your sentence. Otis --- Leonid Portnoy [EMAIL PROTECTED] wrote: Doug Cutting wrote: The documentation is unclear. Can you propose an improvement? Yes - I think the following sentence should be appended after or neither, in which case matched documents are neither prohibited from nor required to match the sub-query. : However, a document must match at least one sub-query to match the boolean query. Thanks, Leonid - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Are lucene have a configuration feature for storage compression option?
Alex, Otis, On Friday 14 May 2004 13:58, Otis Gospodnetic wrote: Moving to lucene-user list. Hello, Didn't I already answer these questions? 1. No :( There is bit more to say, see below. ... --- Alex Aw Seat Kiong [EMAIL PROTECTED] wrote: Hi! Some question about lucene: 1. Are lucene have a configuration feature for storage compression option? Lucene indexes are quite compact already. Text (western languages) is normally indexed to about 1/3 it's original size, I don't know about CJK. You can have a look at the file formats on the Lucene web site to see how the compression is done. Among others there are common prefixes for the sorted terms, variable length integers, and storing differences between integers instead of the complete numbers when possible. One place where compression might be useful is in the stored fields, but there is no API for it in Lucene. Kind regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
(Distributed) Search system designs
Hi, I currently have a working search system based on lucene 1.2 as follows 14 indexes, average size just over 1G, min size 36M, max size 3.3G, total size 15G. Search times are currently between 20s and 4 minutes depending on the query, the system uses a multisearcher to search all indexes. The indexes are currently all stored on an internal raid. There are lots of things wrong with the index, including many words which should be in stop lists which aren't etc. The search is run on a linux system with 8G of RAM and 2G of swap. - - - - I am looking at writing a replacement system, and this time trying to everything properly, writing document parsers etc. Any pointers would be well recieved! The questions: 1) The documentation about how to get a basic lucene search going is great, is there any similar documentation or a HOWTO on how to design and implement distributed searches? 2) For distributed searches what are the best options for building in redundancy? Is a large shared storage solution such a SAN required, or will duplicating indexes on several machines suffice? 3) I had been told that using RAMDirectory on a linux system was pointless because the kernel cached files in spare RAM anyway. Is this true? Thanks! jt Yahoo! Messenger - Communicate instantly...Ping your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Getting a field value from a large indexed document is slow.
Hi, I hope someone can help! I am using Lucene to make a searching repository of electronic documents. (MS Office, PDF's etc.). Some of these document can contain a large amount of text (about 500K of text in some cases) which is indexed to make it searchable. Doing the search and getting the hits found is not effected by the size of the document found. But when I try and access a field (my document id) in the document i.e. // Create Lucene Doc with value Document doc = hits.doc(i); String number = doc.get(Field10); The creation of the Lucene document can take up to a second per hit. I don't actually use any of the other fields apart from getting my ID value from field10. So my question is:- Is there a smarter way of getting out the 'Field10' value without it populating all the rest of the fields in the Lucene document and therefore reduce the time taken for this action. Paul DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Thank you. Valid Information Systems Limited. Address: Morline House, 160 London Road, Barking, Essex, IG11 8BB. http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040 - - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Getting a field value from a large indexed document is slow.
Paul, It might be worth your while to store the file itself outside lucene, and only store the filename in the stored data. This is generally how relational databases deal with LOBs, and will work with Lucene, too. You will also save yourself hours when it comes time to merge indices or optimize, since those operations are, in effect, large copy operations. Regards, Pete - Original Message - From: Paul Williams [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Friday, May 14, 2004 11:22 AM Subject: Getting a field value from a large indexed document is slow. Hi, I hope someone can help! I am using Lucene to make a searching repository of electronic documents. (MS Office, PDF's etc.). Some of these document can contain a large amount of text (about 500K of text in some cases) which is indexed to make it searchable. Doing the search and getting the hits found is not effected by the size of the document found. But when I try and access a field (my document id) in the document i.e. // Create Lucene Doc with value Document doc = hits.doc(i); String number = doc.get(Field10); The creation of the Lucene document can take up to a second per hit. I don't actually use any of the other fields apart from getting my ID value from field10. So my question is:- Is there a smarter way of getting out the 'Field10' value without it populating all the rest of the fields in the Lucene document and therefore reduce the time taken for this action. Paul DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Thank you. Valid Information Systems Limited. Address: Morline House, 160 London Road, Barking, Essex, IG11 8BB. http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040 - - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Getting a field value from a large indexed document is slow.
You say the content is indexed, is it stored? If note, index the content of the document, but don't store it. eg doc.add(Field.UnStored(content, content)); -Original Message- From: Paul Williams [mailto:[EMAIL PROTECTED] Sent: 14 May 2004 16:22 To: 'Lucene Users List' Subject: Getting a field value from a large indexed document is slow. Hi, I hope someone can help! I am using Lucene to make a searching repository of electronic documents. (MS Office, PDF's etc.). Some of these document can contain a large amount of text (about 500K of text in some cases) which is indexed to make it searchable. Doing the search and getting the hits found is not effected by the size of the document found. But when I try and access a field (my document id) in the document i.e. // Create Lucene Doc with value Document doc = hits.doc(i); String number = doc.get(Field10); The creation of the Lucene document can take up to a second per hit. I don't actually use any of the other fields apart from getting my ID value from field10. So my question is:- Is there a smarter way of getting out the 'Field10' value without it populating all the rest of the fields in the Lucene document and therefore reduce the time taken for this action. Paul DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Thank you. Valid Information Systems Limited. Address: Morline House, 160 London Road, Barking, Essex, IG11 8BB. http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040 - - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
exact the same score from different documents
Hi, I am getting the exactly same score like 0. 04809519 for different size documents for some queries and this happens quite frequently. Based on the score formula, it seems this should rarely happen. Or I misunderstand the formula? Regards, Hui - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: exact the same score from different documents
hui wrote: I am getting the exactly same score like 0. 04809519 for different size documents for some queries and this happens quite frequently. Based on the score formula, it seems this should rarely happen. Or I misunderstand the formula? Normalization factors ( document boosts) are represented in the index using a one-byte float format with a 3-bit mantissa, which means that differences of plus-or-minus 1/8 are rounded to a single value. For example, a field with 256 tokens by default has a lengthNorm() of 16.0. With a three-bit mantissa, values 16.0 to 18.0 are rounded to the same value, which means that fields with between 256 and 324 tokens will have the same effective length normalization. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Question on QueryParser.parse()
I am trying to create a query object using the QueryParser for the search word A+. However, it always returns a Null object back. My code is stated below: Query q = QueryParser.parse(A\+, myIndex, new StandardAnalyzer()); I've also tried the following query strings, but none of them returns the query object back. myIndex: A\+ myIndex: A\+ myIndex: A+ Does anyone know the solution? By the way, I am using Lucence 1.4 RC3. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Question on QueryParser.parse()
- Original Message - From: Terence Lai [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Friday, May 14, 2004 2:12 PM Subject: Question on QueryParser.parse() I am trying to create a query object using the QueryParser for the search word A+. However, it always returns a Null object back. My code is stated below: Query q = QueryParser.parse(A\+, myIndex, new StandardAnalyzer()); I've also tried the following query strings, but none of them returns the query object back. myIndex: A\+ myIndex: A\+ myIndex: A+ Does anyone know the solution? StandardAnalyzer is what's stopping you. It will discard the letter A because of its StopFilter, and it will discard the + because it considers that character to be noise. You have a few choices: 1. Build the query manually. Query q = new TermQuery(new Term(myIndex, A+); This will only work if you actually indexed the term A+ (i.e. if you did not use StandardAnalyzer when you indexed the docs). 2. Use a different analyzer. In this case, the only stock analyzer that will work is the WhitespaceAnalyzer. Bear in mind that if you switch analyzers you will have to reindex your content with your new choice of analyzer. If you need to search unusual terms like this, you might want to make your own Analyzer out of a WhitespaceTokenizer and a LowerCaseFilter, as follows: class LcWsAnalyzer extends Analyzer { public TokenStream tokenStream(String fieldName, Reader reader) { return new LowerCaseFilter(new WhitespaceTokenizer(reader)); } } When the above Analyzer is used, QueryParser.parse() returns myIndex:a+ as the term it will search for. By the way, I am using Lucence 1.4 RC3. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]