Re: Atomicity in Lucene operations

2004-10-17 Thread Roy Shan
Hello, Nader:

I am very interested in how you implement the atomicity. Could you
send me a copy of your code?

Thanks in advance.

Roy



On Sat, 16 Oct 2004 01:20:09 +0400, Nader Henein [EMAIL PROTECTED] wrote:
 We use Lucene over 4 replicated indecies and we have to maintain
 atomicity on deletion and updates with multiple fallback points. I'll
 send you the right up, it's too big to CC the entire board.
 
 nader henein
 
 
 
 Christian Rodriguez wrote:
 
 Hello guys,
 
 I need additions and deletions of documents to the index to be ATOMIC
 (they either happen to completion or not at all).
 
 On top of this, I need updates (which I currently implement with a
 deletion of the document followed by an addition) to be ATOMIC and
 DURABLE (once I return from the update function its because the
 operation happened to completion and stays in the index).
 
 Notice that I dont really need all the ACID properties for all the operations.
 
 I have tried to solve the problem by using the Lucene + BDB package
 written by Andi Vajda and using transactions, but the BDB database
 gets corrupted if I insert random System.exit() to simulate a crash of
 the application before aborting or commiting transactions.
 
 So I have two questions:
 1. Has anyone been able to use the Lucene + BDB WITH transactions and
 simulate random crashes at different points in the process of addding
 items and found it to be robust (specially, have you been able to
 always recover after a crash, with uncommited txns rolled back and
 commited ones present in the DB)?
 2. Can anyone suggest other solutions (beside using BDB) that may
 work? For example: are any of these operations already atomic in
 Lucene (using an FSDirectory)?
 
 Thanks for any help you can give me!
 Xtian
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-- 
Roy

**May I open-source your mind?**

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Atomicity in Lucene operations

2004-10-17 Thread Nader Henein
It's pretty integrated into our system at this point, I'm working on
Packaging it and cleaning up my documentation and then I'll make it
available, I can give you the documents and if you still want the code
I'll slap together a ruff copy for you and ship it across.
Nader Henein
Roy Shan wrote:
Hello, Nader:
I am very interested in how you implement the atomicity. Could you
send me a copy of your code?
Thanks in advance.
Roy

On Sat, 16 Oct 2004 01:20:09 +0400, Nader Henein [EMAIL PROTECTED] wrote:
 

We use Lucene over 4 replicated indecies and we have to maintain
atomicity on deletion and updates with multiple fallback points. I'll
send you the right up, it's too big to CC the entire board.
nader henein

Christian Rodriguez wrote:
   

Hello guys,
I need additions and deletions of documents to the index to be ATOMIC
(they either happen to completion or not at all).
On top of this, I need updates (which I currently implement with a
deletion of the document followed by an addition) to be ATOMIC and
DURABLE (once I return from the update function its because the
operation happened to completion and stays in the index).
Notice that I dont really need all the ACID properties for all the operations.
I have tried to solve the problem by using the Lucene + BDB package
written by Andi Vajda and using transactions, but the BDB database
gets corrupted if I insert random System.exit() to simulate a crash of
the application before aborting or commiting transactions.
So I have two questions:
1. Has anyone been able to use the Lucene + BDB WITH transactions and
simulate random crashes at different points in the process of addding
items and found it to be robust (specially, have you been able to
always recover after a crash, with uncommited txns rolled back and
commited ones present in the DB)?
2. Can anyone suggest other solutions (beside using BDB) that may
work? For example: are any of these operations already atomic in
Lucene (using an FSDirectory)?
Thanks for any help you can give me!
Xtian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   


 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: StopWord elimination pls. HELP

2004-10-17 Thread Daniel Naber
On Sunday 17 October 2004 05:23, Miro Max wrote:

 d.add(Field.Text(cont, cont));
 writer.addDocument(d);

 to get results from a database into lucene index. but
 when i check println(d) i can see the german stopwords
 too. how can i eliminate this?

Field.Text(field, cont) where cont is a String will also store the 
original text, additionally to indexing it. toString() will then show the 
stored text. In the index you won't have any stopwords.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



index, reindexing problem

2004-10-17 Thread MATL (Mats Lindberg)
Hello.
 
I have a problem when reindexing some documents after an index has been
created, i get an error, the error is the following.
caught a class java.io.IOException

with message: Lock obtain timed out:
[EMAIL PROTECTED]:\DOCUME~1\..lucene-0b877c2d5472a608d6ec3ee6174018de-write
.lock
mailto:[EMAIL PROTECTED]:\DOCUME~1\..lucene-0b877c2d5472a608d6ec3ee6174018
de-write.lock 

 
This is how i do it.
1.st make the index (_indexDir is the location of the index)
writer = new IndexWriter(_indexDir, new StandardAnalyzer(), true);

. do the indexing here

writer.optimize();

writer.close();

this works fine

 
2. this is where i get the error (reindex an existing document)
writer = new IndexWriter(_indexDir, new StandardAnalyzer(), false);
Directory directory;

IndexReader reader;

// if the file is in the index already, remove it

directory = FSDirectory.getDirectory(_indexDir, false);

reader = IndexReader.open(directory);

try {

Term term = new Term(deleteid, deleteID.toLowerCase()); 

if (reader.docFreq(term) = 1) {

deletedItems = reader.delete(term);// - this is where the error
occurs, i get the locking error

}

} catch (Exception e) {

System.out.println( caught a  + e.getClass() + \n with message:  +
e.getMessage());}

finally {

reader.close();

directory.close();

}

continue with reindexing the new document

..

 

I hope anyone can help me with this problem.

 

Best regards,

Mats Lindberg

 



RE: index, reindexing problem

2004-10-17 Thread Chuck Williams
I had this same problem a while back.  It should be resolved if you move
the writer = new IndexWriter(...) until after the reader.close().  I.e.,
complete all the deletions and close the reader before creating the
writer.

Chuck

 -Original Message-
 From: MATL (Mats Lindberg) [mailto:[EMAIL PROTECTED]
 Sent: Sunday, October 17, 2004 5:36 AM
 To: [EMAIL PROTECTED]
 Subject: index, reindexing problem
 
 Hello.
 
 I have a problem when reindexing some documents after an index has
been
 created, i get an error, the error is the following.
 caught a class java.io.IOException
 
 with message: Lock obtain timed out:

[EMAIL PROTECTED]:\DOCUME~1\..lucene-0b877c2d5472a608d6ec3ee6174018de-write
 .lock

mailto:[EMAIL PROTECTED]:\DOCUME~1\..lucene-0b877c2d5472a608d6ec3ee6174018
 de-write.lock
 
 
 This is how i do it.
 1.st make the index (_indexDir is the location of the index)
 writer = new IndexWriter(_indexDir, new StandardAnalyzer(), true);
 
 . do the indexing here
 
 writer.optimize();
 
 writer.close();
 
 this works fine
 
 
 2. this is where i get the error (reindex an existing document)
 writer = new IndexWriter(_indexDir, new StandardAnalyzer(), false);
 Directory directory;
 
 IndexReader reader;
 
 // if the file is in the index already, remove it
 
 directory = FSDirectory.getDirectory(_indexDir, false);
 
 reader = IndexReader.open(directory);
 
 try {
 
 Term term = new Term(deleteid, deleteID.toLowerCase());
 
 if (reader.docFreq(term) = 1) {
 
 deletedItems = reader.delete(term);// - this is where the error
 occurs, i get the locking error
 
 }
 
 } catch (Exception e) {
 
 System.out.println( caught a  + e.getClass() + \n with message:  +
 e.getMessage());}
 
 finally {
 
 reader.close();
 
 directory.close();
 
 }
 
 continue with reindexing the new document
 
 ..
 
 
 
 I hope anyone can help me with this problem.
 
 
 
 Best regards,
 
 Mats Lindberg
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Google Desktop Could be Better

2004-10-17 Thread Bill Janssen
Bill Tschumy writes:
 I've looked at pdfBox, but the jar file is so big that I 
 hate to burden my users by incorporating it.

Bill,

My system (see http://www.parc.com/janssen/pubs/TR-03-16.pdf) uses
pdftotext underneath.  I've been very satisfied with that.  Another
Java solution would be to use Multivalent
(multivalent.sourceforge.net).  Multivalent, by the way, advertises
the following:

Extract text from all formats. Full-text search with Lucene.

Bill

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



simultanous search and indexing

2004-10-17 Thread Miro Max
hi,

i'm using servlet to search my index and i wish to be
able to create an index at the same time.

do i have to use threads - i'm beginner

thx






___
Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: 
http://mail.yahoo.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: simultanous search and indexing

2004-10-17 Thread Nader Henein
you can do both at the same time, it's thread safe, you will face 
different issues depending on the frequency or your indexing and the 
load on the search, but that shouldn't come into play till your index 
gets nice and heavy. So basically code on.

Nader Henein
Miro Max wrote:
hi,
i'm using servlet to search my index and i wish to be
able to create an index at the same time.
do i have to use threads - i'm beginner
thx



___
Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: 
http://mail.yahoo.de
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]