Hello,
Were using the GermanAnalyzer/Stemmer to index/search our (German)
Website.
I have a few questions:
(1) Why is the GermanAnalyzer case-sensitive? None of the other
language indexers seem to be. What does this feature add?
(2) With the German Analyzer, wildcard searches
Is it thread-safe to share one
instance of IndexSearcher between multiple threads?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Jon,
I too found some problems with the German analyser recently. Here's what
may help:
1. You can try reading Joerg Caumanns' paper A Fast and Simple Stemming
Algorithm for German Words. This paper describes the algorithm
implemented by GermanAnalyser.
2. I guess German nouns all capitalized, so
I had to moderate both Jonathan and Jon's messages in to the list.
Please subscribe to the list and post to it with the address you've
subscribed. I cannot always guarantee I'll catch moderation messages
and send them through in a timely fashion.
Erik
On Mar 1, 2005, at 6:18 AM,
I'm also interested in knowing what can change the doc numbers.
Does this happen frequently? Like Stanislav has been asking... what sort of
operations on the index cause the document number to change for any given
document? If the document numbers change frequently, is there a
straightforward
Hello, Volodymyr.
VB Additional question.
VB If I'm sharing one instance of IndexSearcher between different threads
VB Is it good to just to drop this instance to GC.
VB Because I don't know if some thread is still using this searcher or done
VB with it.
It is safe to share one instance between
Apologies Erik,
This must be one of those apostrophe in email address problems I always
get. Recently I removed the apostrophe from the email address I give out.
Our server recognizes both email addresses, but some of these mail lists
don't like the O'Connor clann!
Ciao,
Jonathan O'Connor
XCOM
I found something kind fo weird about the way lucene interprets boolean
expressions wihout parenthesis.
when i run the query A AND B OR C, it returns only the documents that have A(in
other words as if the query was just the term A).
when I run the query A OR B AND C, it returns only the
Hi,
I have a problem doing IndexReader.delete(int doc)
and it fails on lock error.
Alex Kiselevski
+9.729.776.4346 (desk)
+9.729.776.1504 (fax)
AMDOCS INTEGRATED CUSTOMER MANAGEMENT
The information contained in this message is proprietary of Amdocs,
protected from disclosure, and may be
Additional question.
If I'm sharing one instance of IndexSearcher between different threads
Is it good to just to drop this instance to GC.
Because I don't know if some thread is still using this searcher or done
with it.
Note that as far as one of the threads keep a reference on the
I probably had the same trouble (but I'm not sure).
I have run a test programm that was creating a lot of IndexSearchers (but also
close and free them).
It went to an outOfMemory Exception.
But i'm not finished with that problem (need to use a profiler).
But I have discovered one strange
may be you have open IndexWriter at the same time you are trying to
delete document.
Alex Kiselevski wrote:
Hi,
I have a problem doing IndexReader.delete(int doc)
and it fails on lock error.
Alex Kiselevski
+9.729.776.4346 (desk)
+9.729.776.1504 (fax)
AMDOCS INTEGRATED CUSTOMER MANAGEMENT
Hello;
Anyone have an ideas on how to index the contents within zip files?
Thanks,
Luke
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hello
first, you need a parser for each file type: pdf, txt, word, etc.
and use a java api to iterate zip content, see:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html
use getNextEntry() method
little example:
ZipInputStream zis = new ZipInputStream(fileInputStream);
Hi,
just an idea how to manage large index that is updated very often.
Very often there is need to update an document in index. To update
document in index you should delete old document from index and then add
new one. In most cases it require you to open IndexReader, delete
document, close
Thanks Ernesto.
The issue I'm working with now (this is more lack of experience than
anything) is getting an input I can index. All my indexing classes (doc,
pdf, xml, ppt) take a File object as a parameter and return a Lucene
Document containing all the fields I need.
I'm struggling with how I
Stanislav Jordanov wrote:
startTs = System.currentTimeMillis();
dummyMethod(hits.doc(nHits - nHits));
stopTs = System.currentTimeMillis();
System.out.println(Last doc accessed in + (stopTs -
startTs)
Luke,
Look at the javadocs for java.io.ByteArrayInputStream - it wraps a
byte array and makes it accessible as an InputStream. Also see
java.util.zip.ZipFile. You should be able to read and parse all
contents of the zip file in memory.
I am looking for a solution to a problem I am having. We have a web-based asset
management solution where we manage customers assets.
We have had requests from some clients who would like the ability to index
PDF files, now and possibly other text files in the future. The PDF files live
on a
See inlined comments below.
We have had requests from some clients who would like the ability to
index PDF files, now and possibly other text files in the future. The
PDF files live on a server and are in a structured environment. I would
like to somehow index the content inside the PDF and
Lucene Users,
We have a requirement for a new version of our software that it run in a
clustered environment. Any node should be able to go down but the
application must keep functioning.
Currently, we use Lucene on a single node but this won't meet our fail
over requirements. If we can't find a
Daniel Naber wrote:
After fixing this I can reproduce the problem with a local index that
contains about 220.000 documents (700MB). Fetching the first document
takes for example 30ms, fetching the last one takes 100ms. Of course I
tested this with a query that returns many results (about
Hi
My site has two types of documents with different structure. I would
like to create an index for each type of document. What is the best
way to implement this?
I have been trying to implement this but found out that 90% of the
code is the same.
In Lucene in Action book, there is a case study
Hi all,
I have a web-based application that we use to index text documents
as well as images; the indexes fields are either Field.Unstored or
Field.Keyword.
Currently, we plan to modify some of the index field names. For
example, if the index field name was
It's hard to answer such a general question with anything very precise,
so sorry if this doesn't hit the mark. Come back with more details and
we'll gladly assist though.
First, certainly do not copy/paste code. Use standard reuse practices,
perhaps the same program can build the two
6. Index locally and synchronize changes periodically. This is an
interesting idea and bears looking into. Lucene can combine multiple
indexes into a single one, which can be written out somewhere else, and
then distributed back to the search nodes to replace their existing
index.
This is a
Is it true that for each index I have to create a seperate instance
for FSDirectory, IndexWriter and IndexReader? Do I need to create a
seperate locking mechanism as well?
I have already implemented a program using just one index.
Thanks,
Ben
On Tue, 1 Mar 2005 22:09:05 -0500, Erik Hatcher
Yonik Seeley wrote:
6. Index locally and synchronize changes periodically. This is an
interesting idea and bears looking into. Lucene can combine multiple
indexes into a single one, which can be written out somewhere else, and
then distributed back to the search nodes to replace their existing
Ben,
You do need to use a separate instance of those 3 classes for each
index yes. But this is really something like:
IndexWriter writer = new IndexWriter();
So it's normal code-writing process you don't really have to create
anything new, just use existing Lucene API. As for locking,
This list is about to be moved to java-user at lucene.apache.org.
Please excuse the temporary inconvenience.
Cheers,
Roy T. Fielding, co-founder, The Apache Software Foundation
([EMAIL PROTECTED]) http://roy.gbiv.com/
30 matches
Mail list logo