Thread safety
Hi, I have an urgent question about thread safety in lucene, from lucene doc and code I could not get a clear answer. 1. is Searcher (IndexSearcher, MultiSearcher ..) thread safe, can multi-users call search(..) method on the same object at the same time? 2. if on the same object, one user calls close( ) and another calls search(..), I assume we should have a meaningful error message? 3. what would happen if one user calls Searcher.search(..), but at the same time another user tries to delete that document from index files by calling IndexReader.delete(..) (either through two threads or two separate processes)? A brief answer would be good enough for me now, thanks very much in advance! Lisheng - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Keyword query confusion
Hi, Erik and others mentioned that is_pub:1 won't work because of Analyzer, but I remember in my test StandardAnalyzer does not take number away, but SimpleAnalyzer does. According to previous mail it is the Standard Analyzer being used here, how could the number 1 is parsed away? I used lucene 1.4, rc3. Thanks very much for helps, Lisheng -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Saturday, September 25, 2004 1:59 AM To: Lucene Users List Subject: Re: Keyword query confusion On Sep 24, 2004, at 12:26 PM, Fred Toth wrote: I'm trying to understand what's going on with the query parser and keyword fields. It's a confusing situation, for sure. I've got a large subset of my documents which are publications. So as to be able to query these, I've got this in the indexer: doc.add(Field.Keyword(is_pub, 1)); However, if I run a query: is_pub:1 I get no hits. If I find a document by other means and dump the fields, the is_pub keyword is there, with value of 1. As already stated - it is the analyzer eating the 1. Every field is analyzed by QueryParser, but during indexing Field.Keyword fields are not indexed. Search the archives for discussion on a KeywordAnalyzer and how to use it with PerFieldAnalyzerWrapper. Also, the info here is valuable: http://wiki.apache.org/jakarta-lucene/AnalysisParalysis Visualizing what an analyzer does and using Query.toString are both techniques to clearly point out what is happening. Now, I've learned that if I change the field to contain the value true instead of the string 1, this query: is_pub:true works just fine. So, I'm pretty sure I'm running afoul of the analyzer, right? The doc says specifically that I should add keyword query clauses programmatically, and I'm guessing that's what's wrong. It really depends on your needs. I personally wouldn't want end-users knowing to type is_pub:true into a search box. Designing the most appropriate search interface for your situation is highly recommended. And in this case a checkbox for Is published? that translates into a TermQuery behind the scenes (likely combined with other pieces, perhaps a QueryParser parsed piece, using BooleanQuery). TermQuery text is not analyzed, so you'd be safe there. But can someone explain this? It sure is useful to be able to test this sort of thing with the query parser. What is going on with the standard analyzer that makes true work and 1 not work? Numbers get axed, that is what happens. Is there a way around this other than by writing code to create the query? This also applies to other types of query, like pub_date:2004. A PerFieldAnalyzerWrapper using WhitespaceAnalyzer for the is_pub field would do the trick in this case. Again, users typing pub_date:2004 seems awkward to me - make a year drop-down box if they need to select a year. Hoping for enlightenment... Now that's a tall order... or is it?! It's surrounding us all - we simply have to breath it in. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Free software to crawl internet site?
Hi, Does anyone know if there is free-software to crawl internet site (webcrawler)? I know currently lucene does not have this feature according to official lucene FAQ. Thanks very much for helps, Lisheng - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Power Point Processing
Hi, Thanks very much for helps, I will try that. Best regards, Lisheng -Original Message- From: Magnus Johansson [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 11:15 PM To: Lucene Users List Subject: Re: Power Point Processing I've had some success with the code found at http://www.mail-archive.com/[EMAIL PROTECTED]/msg04809.html together with POI. Then there's OpenOffice, but I don't really think it is usable in a production envrionment /Magnus Johansson Hi, Does anyone know a good tool to processing MS Power Point file (*.ppt) into plain text so we can use lucene to index it? I looked at jakarta/POI, and only see Word and Excel documents can be processed, some JavaDoc pages mentioned ppt, but status is not clear to me? Thanks very much for helps, Lisheng - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Power Point Processing
Hi, Does anyone know a good tool to processing MS Power Point file (*.ppt) into plain text so we can use lucene to index it? I looked at jakarta/POI, and only see Word and Excel documents can be processed, some JavaDoc pages mentioned ppt, but status is not clear to me? Thanks very much for helps, Lisheng - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: worddoucments search
Hi Otis, I looked at textmining site, it seems to me textmining is a wrapper on the top of POI, so the basic features should be the same as POI, is this true? I have tested POI with lucene, in general it works fine, but I found sometimes it cannot process some MSDOC files created from old version. But if I just save the old DOC file by new Word on XP, eveything is fine. Thanks very much for helps, Lisheng -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 10:24 AM To: Lucene Users List Subject: Re: worddoucments search As I just answered in a separate email to Ryan - we used textmining.org library, too, as an example of something that is easier to use than POI. It's been a while since I wrote that chapter, so it slipped my mind when I replied. Yes, use textmining.org first, you'll be able to include it in your code in 2 minutes. Good stuff. Otis --- Ryan Ackley [EMAIL PROTECTED] wrote: Otis, Why didn't you use the textmining.org library? You even asked me to fix a bug for the book , which I did. Also, the code would have been about three lines. -Ryan - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 7:41 AM Subject: Re: worddoucments search For Lucene in Action Erik and I wrote a little extensible framework for indexing various documents, including MS Word. We used POI, so the solution works on Winblows, UNIX/Linux, OSX I think the code is bit too big for the list, but the book will be out soon. Erik and I are going through copy and tech editing right now. POI: http://jakarta.apache.org/poi . Otis --- Don Vaillancourt [EMAIL PROTECTED] wrote: I could ber wrong, but I don't think that there is an indexer for word documents. There's a Python version of Lucene called Lupy with a Python indexer for all sorts of document types (http://www.methods.co.nz/docindexer/). Would anyone be willing to port those over. Although the MSWord indexer only words on MSWindows and you may need MSWord for it to work. Man, that's no good. I think that we'd need to ask the OpenOffice people for help on this. Santosh wrote: Can lucene be able to search word documents? if so please give me information about it regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- *Don Vaillancourt Director of Software Development * *WEB IMPACT INC.* phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] web: http://www.web-impact.com / This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. / - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands,
RE: lucene 1.4 final src build error
Hi, What JVM version did you have, I guess possibly yours is JVM1.3.x? From my experience I think lucene 1.4 can only be compiled under JVM1.4.x. Regards, Lisheng -Original Message- From: juan dix [mailto:[EMAIL PROTECTED] Sent: Friday, July 16, 2004 10:58 AM To: [EMAIL PROTECTED] Subject: lucene 1.4 final src build error Just trying to do a src build using ant on lucene 1.4 final. and getting compile error for SortComparator.java. Any ideas? # D:\lucene-1.4-finalant Buildfile: build.xml init: [mkdir] Created dir: D:\lucene-1.4-final\build [mkdir] Created dir: D:\lucene-1.4-final\dist compile-core: [mkdir] Created dir: D:\lucene-1.4-final\build\classes\java [javac] Compiling 160 source files to D:\lucene-1.4-final\build\classes\java [javac] D:\lucene-1.4-final\src\java\org\apache\lucene\search\SortComparator .java:37: unreported exception java.io.IOException; must be caught or declared to be thrown [javac] protected Comparable[] cachedValues = FieldCache.DEFAULT.getCustom (reader, field, SortComparator.this); [javac] ^ [javac] 1 error BUILD FAILED D:\lucene-1.4-final\build.xml:140: Compile failed; see the compiler error output for details. Total time: 25 seconds ### should I just modify my own try and catch to the original src? thanks. -juan _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: lucene 1.4 final src build error
Hi, Did you do a complete cleanup before compiling under JVM14x? Regards, Lisheng -Original Message- From: juan dix [mailto:[EMAIL PROTECTED] Sent: Friday, July 16, 2004 12:52 PM To: [EMAIL PROTECTED] Subject: RE: lucene 1.4 final src build error thx but when installled java1.4 i am getting these errors now: # D:\lucene-1.4-finaljava -version java version 1.4.2_05 Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_05-b04) Java HotSpot(TM) Client VM (build 1.4.2_05-b04, mixed mode) D:\lucene-1.4-finalant Buildfile: build.xml init: [mkdir] Created dir: D:\lucene-1.4-final\build compile-core: [mkdir] Created dir: D:\lucene-1.4-final\build\classes\java [javac] Compiling 160 source files to D:\lucene-1.4-final\build\classes\java [rmic] RMI Compiling 1 class to D:\lucene-1.4-final\build\classes\java [rmic] error: Invalid class file format: D:\lucene-1.4-final\build\classes\ java\org\apache\lucene\search\RemoteSearchable.class, wrong version: 46, expected 45 [rmic] error: Class org.apache.lucene.search.RemoteSearchable not found. [rmic] 2 errors BUILD FAILED D:\lucene-1.4-final\build.xml:145: Rmic failed; see the compiler error output for details. Total time: 19 seconds D:\lucene-1.4-final ## Strange I never had a problem with building lucene1.3. Please advise. Thanks. -juan From: Zhang, Lisheng [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Subject: RE: lucene 1.4 final src build error Date: Fri, 16 Jul 2004 11:13:31 -0700 Hi, What JVM version did you have, I guess possibly yours is JVM1.3.x? From my experience I think lucene 1.4 can only be compiled under JVM1.4.x. Regards, Lisheng -Original Message- From: juan dix [mailto:[EMAIL PROTECTED] Sent: Friday, July 16, 2004 10:58 AM To: [EMAIL PROTECTED] Subject: lucene 1.4 final src build error Just trying to do a src build using ant on lucene 1.4 final. and getting compile error for SortComparator.java. Any ideas? # D:\lucene-1.4-finalant Buildfile: build.xml init: [mkdir] Created dir: D:\lucene-1.4-final\build [mkdir] Created dir: D:\lucene-1.4-final\dist compile-core: [mkdir] Created dir: D:\lucene-1.4-final\build\classes\java [javac] Compiling 160 source files to D:\lucene-1.4-final\build\classes\java [javac] D:\lucene-1.4-final\src\java\org\apache\lucene\search\SortComparator .java:37: unreported exception java.io.IOException; must be caught or declared to be thrown [javac] protected Comparable[] cachedValues = FieldCache.DEFAULT.getCustom (reader, field, SortComparator.this); [javac] ^ [javac] 1 error BUILD FAILED D:\lucene-1.4-final\build.xml:140: Compile failed; see the compiler error output for details. Total time: 25 seconds ### should I just modify my own try and catch to the original src? thanks. -juan _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Build lucene1.4-rc3
Hi, I tried to build lucene 1.4 -rc3 with ant 1.5.3 and java 1.4.1_02. When I type ant clean, I got an error message: build.xml:11: Unexpected element tstamp. It seems like ant version problem, but BUILD.txt said ant 1.5 should be good enough ? Also BUILD.txt mentioned on root directory we should have default.properties, but I did not this file (possible OK, I did not see this file is referenced inside build.xml). Thanks very much for helps, Lisheng - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Build lucene1.4-rc3
Thanks very much for this quick help, actually I looked at lucene 13 BUILD.txt. Best regards, Lisheng -Original Message- From: Terence Lai [mailto:[EMAIL PROTECTED] Sent: Saturday, May 15, 2004 8:55 PM To: Lucene Users List Cc: '[EMAIL PROTECTED]'; Venkatraman, Shiv Subject: RE: Build lucene1.4-rc3 You need to use Ant 1.6 to build lucene. The BUILD.txt does mention that. Basic steps: 0) Install JDK 1.2 (or greater), Ant 1.6 (or greater), and the Ant optional.jar 1) Download Lucene from Apache and unpack it 2) Connect to the top-level of your Lucene installation 3) Install JavaCC (optional) 4) Run ant Hope this helps. Hi, I tried to build lucene 1.4 -rc3 with ant 1.5.3 and java 1.4.1_02. When I type ant clean, I got an error message: build.xml:11: Unexpected element tstamp. It seems like ant version problem, but BUILD.txt said ant 1.5 should be good enough ? Also BUILD.txt mentioned on root directory we should have default.properties, but I did not this file (possible OK, I did not see this file is referenced inside build.xml). Thanks very much for helps, Lisheng - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
lucene-1.4-rc2 and JVM version
Hi, We were starting to learn and use lucene about 3 weeks ago, it is really a great product! Here we have some problems with certain JVM versions (SUN jdk). We are using lucene-1.4-rc2 on Solaris 2.8 platform: (1) We have a program to index about 230 documents. If using jdk1.4.1_02, our program often hanged at IndexWriter.addDocument(doc); At which document it hanges is essentially random. My question is: is there any known issues with jdk1.4.1_02 and lucene-1.4-rc2 (BUILD.txt said any jdk later than 1.2 is OK) ? (2) We also found for some trivial search program, jdk1.3.0 would crash, but jdk1.3.1_03 is OK (below I attached my search code). If running on jdk1.3.0, I got the following message (at the line calling IndexSearcher.search(...)): # # HotSpot Virtual Machine Error, Unexpected Signal 11 # Please report this error at # http://java.sun.com/cgi-bin/bugreport.cgi # # Error ID: 4F533F534F4C415249530E435050079A 01 # # Problematic Thread: prio=5 tid=0x29800 nid=0x1 runnable # Is this a known problem with jdk1.3.0 ? The same program run through with jdk1.3.1_03 fine. I would really appreciate any help and guidance on these two issues. Best regards, Lisheng ## import java.io.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.index.Term; import org.apache.lucene.search.Searcher; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.Hits; import org.apache.lucene.search.Sort; import org.apache.lucene.queryParser.QueryParser; class UrSearch { private static void log(String msg) { System.out.println(msg); } public static void main(String[] argv) { try { Searcher searcher = new IndexSearcher(./myindex); Searcher[] searches = new Searcher[1]; searches[0] = searcher; Analyzer analyzer = new StandardAnalyzer(); Query query0 = simpleQuery(analyzer); log(Q= + query0.toString()); log(QueryClass= + query0.getClass().toString()); Sort sort = new Sort(); // Crash on this line if jdk1.3.0 !!! Hits hits = searcher.search(query0, sort); log(hits.length() + total matching documents); for(int i=0; ihits.length(); i++) { Document doc = hits.doc(i); log(docid= + doc.get(docid)); log(score= + hits.score(i)); } searcher.close(); } catch (Exception ex) { log(EXTYPE: + ex.getClass().getName()); log(EXMSG: + ex.getMessage()); try { PrintWriter mout = new PrintWriter(new FileOutputStream(err.dat), true); ex.printStackTrace(mout); } catch(FileNotFoundException newex) { log(TERRIBLE: + newex.getMessage()); System.exit(0); } } } static Query simpleQuery(Analyzer analyzer) throws Exception { Query q1 = QueryParser.parse(iepeditorial, all, analyzer); return q1; } } ## - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]