Thread safety

2004-12-02 Thread Zhang, Lisheng
Hi,

I have an urgent question about thread safety in lucene,
from lucene doc and code I could not get a clear answer.

1. is Searcher (IndexSearcher, MultiSearcher ..) thread
safe, can multi-users call search(..) method on the
same object at the same time?

2. if on the same object, one user calls close( ) and
another calls search(..), I assume we should have a
meaningful error message?

3. what would happen if one user calls Searcher.search(..),
but at the same time another user tries to delete that
document from index files by calling IndexReader.delete(..)
(either through two threads or two separate processes)?

A brief answer would be good enough for me now, thanks
very much in advance!

Lisheng

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Keyword query confusion

2004-09-29 Thread Zhang, Lisheng
Hi,

Erik and others mentioned that is_pub:1 won't
work because of Analyzer, but I remember in 
my test StandardAnalyzer does not take number
away, but SimpleAnalyzer does.

According to previous mail it is the Standard
Analyzer being used here, how could the number 
1 is parsed away?

I used lucene 1.4, rc3.

Thanks very much for helps, 

Lisheng

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Saturday, September 25, 2004 1:59 AM
To: Lucene Users List
Subject: Re: Keyword query confusion


On Sep 24, 2004, at 12:26 PM, Fred Toth wrote:
 I'm trying to understand what's going on with the query parser
 and keyword fields.

It's a confusing situation, for sure.

 I've got a large subset of my documents which are publications.
 So as to be able to query these, I've got this in the indexer:

 doc.add(Field.Keyword(is_pub, 1));

 However, if I run a query:

   is_pub:1

 I get no hits. If I find a document by other means and dump the
 fields, the is_pub keyword is there, with value of 1.

As already stated - it is the analyzer eating the 1.  Every field is 
analyzed by QueryParser, but during indexing Field.Keyword fields are 
not indexed.

Search the archives for discussion on a KeywordAnalyzer and how to use 
it with PerFieldAnalyzerWrapper.  Also, the info here is valuable:

http://wiki.apache.org/jakarta-lucene/AnalysisParalysis

Visualizing what an analyzer does and using Query.toString are both 
techniques to clearly point out what is happening.

 Now, I've learned that if I change the field to contain the value 
 true
 instead of the string 1, this query:

   is_pub:true

 works just fine.

 So, I'm pretty sure I'm running afoul of the analyzer, right? The doc 
 says
 specifically that I should add keyword query clauses programmatically,
 and I'm guessing that's what's wrong.

It really depends on your needs.  I personally wouldn't want end-users 
knowing to type is_pub:true into a search box.  Designing the most 
appropriate search interface for your situation is highly recommended.  
And in this case a checkbox for Is published? that translates into a 
TermQuery behind the scenes (likely combined with other pieces, perhaps 
a QueryParser parsed piece, using BooleanQuery).  TermQuery text is not 
analyzed, so you'd be safe there.

 But can someone explain this? It sure is useful to be able to test this
 sort of thing with the query parser. What is going on with the standard
 analyzer that makes true work and 1 not work?

Numbers get axed, that is what happens.

 Is there a way around this other than by writing code to create the
 query? This also applies to other types of query, like pub_date:2004.

A PerFieldAnalyzerWrapper using WhitespaceAnalyzer for the is_pub 
field would do the trick in this case.

Again, users typing pub_date:2004 seems awkward to me - make a year 
drop-down box if they need to select a year.

 Hoping for enlightenment...

Now that's a tall order... or is it?!  It's surrounding us all - we 
simply have to breath it in.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Free software to crawl internet site?

2004-09-29 Thread Zhang, Lisheng
Hi,

Does anyone know if there is free-software to crawl internet site
(webcrawler)? I know currently lucene does not have this feature
according to official lucene FAQ.

Thanks very much for helps, 

Lisheng

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Power Point Processing

2004-09-24 Thread Zhang, Lisheng
Hi,

Thanks very much for helps, I will try that.

Best regards, Lisheng

-Original Message-
From: Magnus Johansson [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 23, 2004 11:15 PM
To: Lucene Users List
Subject: Re: Power Point Processing


I've had some success with the code found at

http://www.mail-archive.com/[EMAIL PROTECTED]/msg04809.html

together with POI.

Then there's OpenOffice, but I don't really think it is usable
in a production envrionment

/Magnus Johansson


 Hi,

 Does anyone know a good tool to processing MS Power Point
 file (*.ppt) into plain text so we can use lucene to index it?

 I looked at jakarta/POI, and only see Word and Excel documents
 can be processed, some JavaDoc pages mentioned ppt, but
 status is not clear to me?

 Thanks very much for helps, Lisheng

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Power Point Processing

2004-09-23 Thread Zhang, Lisheng
Hi,

Does anyone know a good tool to processing MS Power Point
file (*.ppt) into plain text so we can use lucene to index it?

I looked at jakarta/POI, and only see Word and Excel documents
can be processed, some JavaDoc pages mentioned ppt, but
status is not clear to me?

Thanks very much for helps, Lisheng

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: worddoucments search

2004-08-30 Thread Zhang, Lisheng
Hi Otis,

I looked at textmining site, it seems to me textmining
is a wrapper on the top of POI, so the basic features
should be the same as POI, is this true?

I have tested POI with lucene, in general it works fine, 
but I found sometimes it cannot process some MSDOC files
created from old version. But if I just save the old
DOC file by new Word on XP, eveything is fine.

Thanks very much for helps, 

Lisheng

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 24, 2004 10:24 AM
To: Lucene Users List
Subject: Re: worddoucments search


As I just answered in a separate email to Ryan - we used textmining.org
library, too, as an example of something that is easier to use than
POI.  It's been a while since I wrote that chapter, so it slipped my
mind when I replied.  Yes, use textmining.org first, you'll be able to
include it in your code in 2 minutes.  Good stuff.

Otis



--- Ryan Ackley [EMAIL PROTECTED] wrote:

 Otis,
 
 Why didn't you use the textmining.org library? You even asked me to
 fix a
 bug for the book , which I did. Also, the code would have been about
 three
 lines.
 
 -Ryan
 
 - Original Message - 
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Tuesday, August 24, 2004 7:41 AM
 Subject: Re: worddoucments search
 
 
  For Lucene in Action Erik and I wrote a little extensible framework
 for
  indexing various documents, including MS Word.  We used POI, so the
  solution works on Winblows, UNIX/Linux, OSX  I think the code
 is
  bit too big for the list, but the book will be out soon.  Erik and
 I
  are going through copy and tech editing right now.  POI:
  http://jakarta.apache.org/poi .
 
  Otis
 
 
  --- Don Vaillancourt [EMAIL PROTECTED] wrote:
 
   I could ber wrong, but I don't think that there is an indexer for
   word
   documents.
  
   There's a Python version of Lucene called Lupy with a Python
 indexer
   for
   all sorts of document types
 (http://www.methods.co.nz/docindexer/).
   Would anyone be willing to port those over.  Although the MSWord
   indexer
   only words on MSWindows and you may need MSWord for it to work. 
 Man,
  
   that's no good.
  
   I think that we'd need to ask the OpenOffice people for help on
 this.
  
  
   Santosh wrote:
  
   Can lucene be able to search word documents? if so please give
 me
   information about it
   
   regards
   Santosh kumar
   
   
   ---SOFTPRO
   DISCLAIMER--
   
   Information contained in this E-MAIL and any attachments are
   confidential being  proprietary to SOFTPRO SYSTEMS  is
 'privileged'
   and 'confidential'.
   
   If you are not an intended or authorised recipient of this
 E-MAIL or
   have received it in error, You are notified that any use,
 copying or
   dissemination  of the information contained in this E-MAIL in
 any
   manner whatsoever is strictly prohibited. Please delete it
   immediately
   and notify the sender by E-MAIL.
   
   In such a case reading, reproducing, printing or further
   dissemination
   of this E-MAIL is strictly prohibited and may be unlawful.
   
   SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an
 attachment
   hereto is free from computer viruses or other defects.
   
   The opinions expressed in this E-MAIL and any ATTACHEMENTS may
 be
   those of the author and are not necessarily those of SOFTPRO
   SYSTEMS.
  
 


   
   
   
  
  
   -- 
   *Don Vaillancourt
   Director of Software Development
   *
   *WEB IMPACT INC.*
   phone: 416-815-2000 ext. 245
   fax: 416-815-2001
   email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
   web: http://www.web-impact.com
  
  
  
   / This email message is intended only for the addressee(s)
   and contains information that may be confidential and/or
   copyright. If you are not the intended recipient please
   notify the sender by reply email and immediately delete
   this email. Use, disclosure or reproduction of this email
   by anyone other than the intended recipient(s) is strictly
   prohibited. No representation is made that this email or
   any attachments are free of viruses. Virus scanning is
   recommended and is the responsibility of the recipient.
   /
   
 
 -
   To unsubscribe, e-mail:
 [EMAIL PROTECTED]
   For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, 

RE: lucene 1.4 final src build error

2004-07-16 Thread Zhang, Lisheng
Hi,

What JVM version did you have, I guess possibly
yours is JVM1.3.x? From my experience I think 
lucene 1.4 can only be compiled under JVM1.4.x.

Regards, Lisheng

-Original Message-
From: juan dix [mailto:[EMAIL PROTECTED]
Sent: Friday, July 16, 2004 10:58 AM
To: [EMAIL PROTECTED]
Subject: lucene 1.4 final src build error 


Just trying to do a src build using ant on lucene 1.4 final.  and getting 
compile error for SortComparator.java.  Any ideas?

#
D:\lucene-1.4-finalant
Buildfile: build.xml

init:
   [mkdir] Created dir: D:\lucene-1.4-final\build
   [mkdir] Created dir: D:\lucene-1.4-final\dist

compile-core:
   [mkdir] Created dir: D:\lucene-1.4-final\build\classes\java
   [javac] Compiling 160 source files to 
D:\lucene-1.4-final\build\classes\java

   [javac] 
D:\lucene-1.4-final\src\java\org\apache\lucene\search\SortComparator
.java:37: unreported exception java.io.IOException; must be caught or 
declared to be thrown
   [javac]   protected Comparable[] cachedValues = 
FieldCache.DEFAULT.getCustom (reader, field, SortComparator.this);
   [javac]   ^
   [javac] 1 error

BUILD FAILED
D:\lucene-1.4-final\build.xml:140: Compile failed; see the compiler error 
output
for details.

Total time: 25 seconds

###

should I just modify my own try and catch to the original src?  thanks.

-juan

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: lucene 1.4 final src build error

2004-07-16 Thread Zhang, Lisheng
Hi,

Did you do a complete cleanup before compiling
under JVM14x?

Regards, Lisheng

-Original Message-
From: juan dix [mailto:[EMAIL PROTECTED]
Sent: Friday, July 16, 2004 12:52 PM
To: [EMAIL PROTECTED]
Subject: RE: lucene 1.4 final src build error


thx but when installled java1.4 i am getting these errors now:

#

D:\lucene-1.4-finaljava -version
java version 1.4.2_05
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_05-b04)
Java HotSpot(TM) Client VM (build 1.4.2_05-b04, mixed mode)

D:\lucene-1.4-finalant
Buildfile: build.xml

init:
[mkdir] Created dir: D:\lucene-1.4-final\build

compile-core:
[mkdir] Created dir: D:\lucene-1.4-final\build\classes\java
[javac] Compiling 160 source files to 
D:\lucene-1.4-final\build\classes\java

 [rmic] RMI Compiling 1 class to D:\lucene-1.4-final\build\classes\java
 [rmic] error: Invalid class file format: 
D:\lucene-1.4-final\build\classes\
java\org\apache\lucene\search\RemoteSearchable.class, wrong version: 46, 
expected 45



 [rmic] error: Class org.apache.lucene.search.RemoteSearchable not 
found.



 [rmic] 2 errors



BUILD FAILED
D:\lucene-1.4-final\build.xml:145: Rmic failed; see the compiler error 
output for details.

Total time: 19 seconds
D:\lucene-1.4-final
##

Strange I never had a problem with building lucene1.3.  Please advise.  
Thanks.

-juan


From: Zhang, Lisheng [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Subject: RE: lucene 1.4 final src build error Date: Fri, 16 Jul 2004 
11:13:31 -0700

Hi,

What JVM version did you have, I guess possibly
yours is JVM1.3.x? From my experience I think
lucene 1.4 can only be compiled under JVM1.4.x.

Regards, Lisheng

-Original Message-
From: juan dix [mailto:[EMAIL PROTECTED]
Sent: Friday, July 16, 2004 10:58 AM
To: [EMAIL PROTECTED]
Subject: lucene 1.4 final src build error


Just trying to do a src build using ant on lucene 1.4 final.  and getting
compile error for SortComparator.java.  Any ideas?

#
D:\lucene-1.4-finalant
Buildfile: build.xml

init:
[mkdir] Created dir: D:\lucene-1.4-final\build
[mkdir] Created dir: D:\lucene-1.4-final\dist

compile-core:
[mkdir] Created dir: D:\lucene-1.4-final\build\classes\java
[javac] Compiling 160 source files to
D:\lucene-1.4-final\build\classes\java

[javac]
D:\lucene-1.4-final\src\java\org\apache\lucene\search\SortComparator
.java:37: unreported exception java.io.IOException; must be caught or
declared to be thrown
[javac]   protected Comparable[] cachedValues =
FieldCache.DEFAULT.getCustom (reader, field, SortComparator.this);
[javac]   ^
[javac] 1 error

BUILD FAILED
D:\lucene-1.4-final\build.xml:140: Compile failed; see the compiler error
output
for details.

Total time: 25 seconds

###

should I just modify my own try and catch to the original src?  thanks.

-juan

_
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Build lucene1.4-rc3

2004-05-15 Thread Zhang, Lisheng
Hi,

I tried to build lucene 1.4 -rc3 with ant 1.5.3 and java 1.4.1_02.

When I type ant clean, I got an error message:

build.xml:11: Unexpected element tstamp.

It seems like ant version problem, but BUILD.txt said ant 1.5
should be good enough ?

Also BUILD.txt mentioned on root directory we should have
default.properties, but I did not this file (possible OK, I did not
see this file is referenced inside build.xml).

Thanks very much for helps, Lisheng

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Build lucene1.4-rc3

2004-05-15 Thread Zhang, Lisheng
Thanks very much for this quick help, actually
I looked at lucene 13 BUILD.txt.

Best regards, Lisheng

-Original Message-
From: Terence Lai [mailto:[EMAIL PROTECTED]
Sent: Saturday, May 15, 2004 8:55 PM
To: Lucene Users List
Cc: '[EMAIL PROTECTED]'; Venkatraman, Shiv
Subject: RE: Build lucene1.4-rc3


You need to use Ant 1.6 to build lucene. The BUILD.txt does mention that.

Basic steps:
  0) Install JDK 1.2 (or greater), Ant 1.6 (or greater), and the Ant
 optional.jar
  1) Download Lucene from Apache and unpack it
  2) Connect to the top-level of your Lucene installation
  3) Install JavaCC (optional)
  4) Run ant

Hope this helps.

 Hi,
 
 I tried to build lucene 1.4 -rc3 with ant 1.5.3 and java 1.4.1_02.
 
 When I type ant clean, I got an error message:
 
 build.xml:11: Unexpected element tstamp.
 
 It seems like ant version problem, but BUILD.txt said ant 1.5
 should be good enough ?
 
 Also BUILD.txt mentioned on root directory we should have
 default.properties, but I did not this file (possible OK, I did not
 see this file is referenced inside build.xml).
 
 Thanks very much for helps, Lisheng
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



lucene-1.4-rc2 and JVM version

2004-05-12 Thread Zhang, Lisheng
Hi,

We were starting to learn and use lucene about 3 weeks ago,
it is really a great product! Here we have some problems with
certain JVM versions (SUN jdk). We are using lucene-1.4-rc2 
on Solaris 2.8 platform:

(1) We have a program to index about 230 documents. If using
jdk1.4.1_02, our program often hanged at

IndexWriter.addDocument(doc);

At which document it hanges is essentially random.

My question is: is there any known issues with jdk1.4.1_02 and
lucene-1.4-rc2 (BUILD.txt said any jdk later than 1.2 is OK) ?

(2) We also found for some trivial search program, jdk1.3.0 would
crash, but jdk1.3.1_03 is OK (below I attached my search code).

If running on jdk1.3.0, I got the following message (at the line
calling IndexSearcher.search(...)):

#
# HotSpot Virtual Machine Error, Unexpected Signal 11
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Error ID: 4F533F534F4C415249530E435050079A 01
#
# Problematic Thread: prio=5 tid=0x29800 nid=0x1 runnable 
#

Is this a known problem with jdk1.3.0 ? The same program run
through with jdk1.3.1_03 fine.

I would really appreciate any help and guidance on these two
issues.

Best regards, Lisheng

##
import java.io.*;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.Sort;
import org.apache.lucene.queryParser.QueryParser;

class UrSearch {
  private static void log(String msg)
  {
System.out.println(msg);
  }

  public static void main(String[] argv) 
  {
try {
  Searcher searcher = new IndexSearcher(./myindex);
  Searcher[] searches = new Searcher[1];
  searches[0] = searcher;

  Analyzer analyzer = new StandardAnalyzer();

  Query query0 = simpleQuery(analyzer);

  log(Q= + query0.toString());
  log(QueryClass= + query0.getClass().toString());

  Sort sort = new Sort(); 

  // Crash on this line if jdk1.3.0 !!!
  Hits hits = searcher.search(query0, sort);

  log(hits.length() +  total matching documents);

  for(int i=0; ihits.length(); i++) {
Document doc = hits.doc(i);

log(docid= + doc.get(docid));
log(score= + hits.score(i));
  }
  searcher.close();

} catch (Exception ex) {
  log(EXTYPE:  + ex.getClass().getName());
  log(EXMSG:  + ex.getMessage());

  try {
PrintWriter mout = new PrintWriter(new FileOutputStream(err.dat),
true);
ex.printStackTrace(mout);
  } catch(FileNotFoundException newex) {
log(TERRIBLE:  + newex.getMessage());
System.exit(0);
  }
}
  }

  static Query simpleQuery(Analyzer analyzer)
throws Exception
  {
Query q1 = QueryParser.parse(iepeditorial, all, analyzer);

return q1;
  }
}
##



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]