Questions about GermanAnalyzer/Stemmer

2005-03-01 Thread Jon Humble
Hello, We’re using the GermanAnalyzer/Stemmer to index/search our (German) Website. I have a few questions: (1) Why is the GermanAnalyzer case-sensitive? None of the other language indexers seem to be. What does this feature add? (2) With the German Analyzer, wildcard searches

Is IndexSearcher thread safe?

2005-03-01 Thread Volodymyr Bychkoviak
Is it thread-safe to share one instance of IndexSearcher between multiple threads? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]

2005-03-01 Thread Jonathan O'Connor
Jon, I too found some problems with the German analyser recently. Here's what may help: 1. You can try reading Joerg Caumanns' paper A Fast and Simple Stemming Algorithm for German Words. This paper describes the algorithm implemented by GermanAnalyser. 2. I guess German nouns all capitalized, so

Re: Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]

2005-03-01 Thread Erik Hatcher
I had to moderate both Jonathan and Jon's messages in to the list. Please subscribe to the list and post to it with the address you've subscribed. I cannot always guarantee I'll catch moderation messages and send them through in a timely fashion. Erik On Mar 1, 2005, at 6:18 AM,

Re: Custom filters document numbers

2005-03-01 Thread tomsdepot-lucene
I'm also interested in knowing what can change the doc numbers. Does this happen frequently? Like Stanislav has been asking... what sort of operations on the index cause the document number to change for any given document? If the document numbers change frequently, is there a straightforward

Re[2]: Is IndexSearcher thread safe?

2005-03-01 Thread Yura Smolsky
Hello, Volodymyr. VB Additional question. VB If I'm sharing one instance of IndexSearcher between different threads VB Is it good to just to drop this instance to GC. VB Because I don't know if some thread is still using this searcher or done VB with it. It is safe to share one instance between

Re: Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]

2005-03-01 Thread Jonathan O'Connor
Apologies Erik, This must be one of those apostrophe in email address problems I always get. Recently I removed the apostrophe from the email address I give out. Our server recognizes both email addresses, but some of these mail lists don't like the O'Connor clann! Ciao, Jonathan O'Connor XCOM

RE: help with boolean expression

2005-03-01 Thread Omar Didi
I found something kind fo weird about the way lucene interprets boolean expressions wihout parenthesis. when i run the query A AND B OR C, it returns only the documents that have A(in other words as if the query was just the term A). when I run the query A OR B AND C, it returns only the

Remove document fails

2005-03-01 Thread Alex Kiselevski
Hi, I have a problem doing IndexReader.delete(int doc) and it fails on lock error. Alex Kiselevski +9.729.776.4346 (desk) +9.729.776.1504 (fax) AMDOCS INTEGRATED CUSTOMER MANAGEMENT The information contained in this message is proprietary of Amdocs, protected from disclosure, and may be

RE: Is IndexSearcher thread safe?

2005-03-01 Thread Cocula Remi
Additional question. If I'm sharing one instance of IndexSearcher between different threads Is it good to just to drop this instance to GC. Because I don't know if some thread is still using this searcher or done with it. Note that as far as one of the threads keep a reference on the

RE: Re[2]: Is IndexSearcher thread safe?

2005-03-01 Thread Cocula Remi
I probably had the same trouble (but I'm not sure). I have run a test programm that was creating a lot of IndexSearchers (but also close and free them). It went to an outOfMemory Exception. But i'm not finished with that problem (need to use a profiler). But I have discovered one strange

Re: Remove document fails

2005-03-01 Thread Volodymyr Bychkoviak
may be you have open IndexWriter at the same time you are trying to delete document. Alex Kiselevski wrote: Hi, I have a problem doing IndexReader.delete(int doc) and it fails on lock error. Alex Kiselevski +9.729.776.4346 (desk) +9.729.776.1504 (fax) AMDOCS INTEGRATED CUSTOMER MANAGEMENT

Zip Files

2005-03-01 Thread Luke Shannon
Hello; Anyone have an ideas on how to index the contents within zip files? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Zip Files

2005-03-01 Thread Ernesto De Santis
Hello first, you need a parser for each file type: pdf, txt, word, etc. and use a java api to iterate zip content, see: http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html use getNextEntry() method little example: ZipInputStream zis = new ZipInputStream(fileInputStream);

Large Index managing

2005-03-01 Thread Volodymyr Bychkoviak
Hi, just an idea how to manage large index that is updated very often. Very often there is need to update an document in index. To update document in index you should delete old document from index and then add new one. In most cases it require you to open IndexReader, delete document, close

Re: Zip Files

2005-03-01 Thread Luke Shannon
Thanks Ernesto. The issue I'm working with now (this is more lack of experience than anything) is getting an input I can index. All my indexing classes (doc, pdf, xml, ppt) take a File object as a parameter and return a Lucene Document containing all the fields I need. I'm struggling with how I

Re: Fast access to a random page of the search results.

2005-03-01 Thread Doug Cutting
Stanislav Jordanov wrote: startTs = System.currentTimeMillis(); dummyMethod(hits.doc(nHits - nHits)); stopTs = System.currentTimeMillis(); System.out.println(Last doc accessed in + (stopTs - startTs)

Re: Zip Files

2005-03-01 Thread Chris Lamprecht
Luke, Look at the javadocs for java.io.ByteArrayInputStream - it wraps a byte array and makes it accessible as an InputStream. Also see java.util.zip.ZipFile. You should be able to read and parse all contents of the zip file in memory.

Investingating Lucene For Project

2005-03-01 Thread Scott Purcell
I am looking for a solution to a problem I am having. We have a web-based asset management solution where we manage customers assets. We have had requests from some clients who would like the ability to index PDF files, now and possibly other text files in the future. The PDF files live on a

Re: Investingating Lucene For Project

2005-03-01 Thread Ben Litchfield
See inlined comments below. We have had requests from some clients who would like the ability to index PDF files, now and possibly other text files in the future. The PDF files live on a server and are in a structured environment. I would like to somehow index the content inside the PDF and

Best Practices for Distributing Lucene Indexing and Searching

2005-03-01 Thread Luke Francl
Lucene Users, We have a requirement for a new version of our software that it run in a clustered environment. Any node should be able to go down but the application must keep functioning. Currently, we use Lucene on a single node but this won't meet our fail over requirements. If we can't find a

Re: Fast access to a random page of the search results.

2005-03-01 Thread Doug Cutting
Daniel Naber wrote: After fixing this I can reproduce the problem with a local index that contains about 220.000 documents (700MB). Fetching the first document takes for example 30ms, fetching the last one takes 100ms. Of course I tested this with a query that returns many results (about

Multiple indexes

2005-03-01 Thread Ben
Hi My site has two types of documents with different structure. I would like to create an index for each type of document. What is the best way to implement this? I have been trying to implement this but found out that 90% of the code is the same. In Lucene in Action book, there is a case study

How to manipulate the lucene index table

2005-03-01 Thread Srimant Mishra
Hi all, I have a web-based application that we use to index text documents as well as images; the indexes fields are either Field.Unstored or Field.Keyword. Currently, we plan to modify some of the index field names. For example, if the index field name was

Re: Multiple indexes

2005-03-01 Thread Erik Hatcher
It's hard to answer such a general question with anything very precise, so sorry if this doesn't hit the mark. Come back with more details and we'll gladly assist though. First, certainly do not copy/paste code. Use standard reuse practices, perhaps the same program can build the two

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-01 Thread Yonik Seeley
6. Index locally and synchronize changes periodically. This is an interesting idea and bears looking into. Lucene can combine multiple indexes into a single one, which can be written out somewhere else, and then distributed back to the search nodes to replace their existing index. This is a

Re: Multiple indexes

2005-03-01 Thread Ben
Is it true that for each index I have to create a seperate instance for FSDirectory, IndexWriter and IndexReader? Do I need to create a seperate locking mechanism as well? I have already implemented a program using just one index. Thanks, Ben On Tue, 1 Mar 2005 22:09:05 -0500, Erik Hatcher

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-01 Thread Doug Cutting
Yonik Seeley wrote: 6. Index locally and synchronize changes periodically. This is an interesting idea and bears looking into. Lucene can combine multiple indexes into a single one, which can be written out somewhere else, and then distributed back to the search nodes to replace their existing

Re: Multiple indexes

2005-03-01 Thread Otis Gospodnetic
Ben, You do need to use a separate instance of those 3 classes for each index yes. But this is really something like: IndexWriter writer = new IndexWriter(); So it's normal code-writing process you don't really have to create anything new, just use existing Lucene API. As for locking,

list moving to lucene.apache.org

2005-03-01 Thread Roy T . Fielding
This list is about to be moved to java-user at lucene.apache.org. Please excuse the temporary inconvenience. Cheers, Roy T. Fielding, co-founder, The Apache Software Foundation ([EMAIL PROTECTED]) http://roy.gbiv.com/