Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Karl Øie

there are some strange problems with FSDirectory, i have found that building 
chuncks in a RAMDirectory and then merge these into a FSDirectory is more 
stable than indexing directly into the FSDirectory, i ran into your problem 
and the dreaded too many open files problems when indexing large documents 
with many fields

using a RAMDir as a middle man solved my problems...

mvh karl øie

On Friday 26 April 2002 13:54, petite_abeille wrote:
 Hello,

 I'm starting to wander how bullet proof are Lucene indexes? Do they
 get corrupted easely? If so is there a way to rebuild them?

 I'm started to get the following exception left and right...

 04/25 18:34:39 (Warning) Indexer.indexObjectWithValues:
 java.io.IOException: _91.fnm already exists

 I build a little app (http://homepage.mac.com/zoe_info/) that uses
 Lucene quiet extensively, and I would like to keep it that way. However,
 I'm starting to have second thought about Lucene's reliability... :-(

 I'm sure I'm doing something wrong somewhere, but I really cannot see
 what...

 Any help or insight greatly appreciated.

 Thanks.

 PA.


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille

 using a RAMDir as a middle man solved my problems...

Thanks. What's is your heuristic to flush the RAMDirectory? Also how do 
you deal with System.exit() or application death? Eg, your are indexing 
something and the application dies or is killed.

Thanks for any input.

R.


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille

 Thanks. What's is your heuristic to flush the RAMDirectory?
 please explain this because i don't understand english that good :-(

That's ok, I don't really understand English either :-)

Simply put, when do you flush the RAMDirectory into the FSDirectory? 
Every five documents? Ten? A thousand? What is a good balance between 
RAM and FS?

Thanks.

PA.


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Karl Øie

ah, now i see, what i have is a server with 512mb of ram, so i have used two 
different approaches and both works ok;

1 - i index a fixed number of documents into a RAMDir, like 10 (each of the 
docs are xml docs about 1,5-2mb) and then i optimize the RAMDir and merge it 
into the FSDir and then optimize the FSDir...

2 - i use the Runtime.freeMemory() and Runtime.totalMemory() to see if i have 
reached more than 80% of the available memory, if so i optimize the RAMDir, 
merge it and optimize the FSDir..., if not i just add more documents to the 
RAMDir

as far as i have tested i have never experienced a failure while merging a 
RAMDir into a FSDir regardless of size, so it's my systems memory that is the 
problem

mvh karl øie


On Friday 26 April 2002 15:33, petite_abeille wrote:
  Thanks. What's is your heuristic to flush the RAMDirectory?
 
  please explain this because i don't understand english that good :-(

 That's ok, I don't really understand English either :-)

 Simply put, when do you flush the RAMDirectory into the FSDirectory?
 Every five documents? Ten? A thousand? What is a good balance between
 RAM and FS?

 Thanks.

 PA.


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Karl Øie

forgot this:

its a bit hard to determine a good number of balance while indexing XML 
documents because the internal relations of a DOM can make a XML document 
become nearly 21 times as big in memory compared to disk (i am not lying, i 
have seen it my self)...

also the RAMDir must be kept in memory while indexing and merging, so checking 
the systems free memory is easier that trying to calculate memoryusage

mvh karl øie



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Otis Gospodnetic

Morning,

 I'm starting to wander how bullet proof are Lucene indexes? Do they
 
 get corrupted easely? If so is there a way to rebuild them?

There is no tool to detect index corruption, fixing of indexing, nor
index rebuilding.
The last one anyone can/has to do on their own.

 I'm started to get the following exception left and right...
 
 04/25 18:34:39 (Warning) Indexer.indexObjectWithValues: 
 java.io.IOException: _91.fnm already exists

I've seen people asking about this on the list, but I never encountered
this particular exception.

 I build a little app (http://homepage.mac.com/zoe_info/) that uses 
 Lucene quiet extensively, and I would like to keep it that way.
 However, 
 I'm starting to have second thought about Lucene's reliability... :-(
 
 I'm sure I'm doing something wrong somewhere, but I really cannot see
 
 what...

Maybe it's not a Lucene issue then, although I've seen this mentioned
so often, which means that documentation could be improved to prevent
people from making the same mistakes that others have already made.

Otis


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille

Hello again,

 There is no tool to detect index corruption, fixing of indexing, nor
 index rebuilding.
 The last one anyone can/has to do on their own.

:-( Well, that *very* sad to say the least... How do I know if my 
indexes are not corrupted even if everything seems to be working fine? 
Don't tell me I'm the first one to run into this kind of issues?!? How 
can I trust an index if there is *no* way of checking its integrity? 
And even if you happen to notice that something is fishy, there is no 
way to rebuild the index -short or re-indexing everything from scratch? 
That does not sound like a very healthy situation to me. Fragile 
will be kind for describing it...

 I've seen people asking about this on the list, but I never encountered
 this particular exception.

Lucky you...

 Maybe it's not a Lucene issue then, although I've seen this mentioned
 so often, which means that documentation could be improved to prevent
 people from making the same mistakes that others have already made.

Maybe, maybe not. And most likely I'm doing something odd. In any case, 
could you point me to the mistakes that others have already made? Or 
did I miss something obvious here?

Thanks.

PA


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Otis Gospodnetic

Hello,

  There is no tool to detect index corruption, fixing of indexing,
 nor
  index rebuilding.
  The last one anyone can/has to do on their own.
 
 :-( Well, that *very* sad to say the least... How do I know if my 
 indexes are not corrupted even if everything seems to be working
 fine? 
 Don't tell me I'm the first one to run into this kind of issues?!?
 How 
 can I trust an index if there is *no* way of checking its
 integrity? 
 And even if you happen to notice that something is fishy, there is no
 
 way to rebuild the index -short or re-indexing everything from
 scratch? 
 That does not sound like a very healthy situation to me. Fragile 
 will be kind for describing it...

Yes, that's all unfortunate.  If you come up with anything, please
share it.  Or, you can use Lucene Sandbox and develop stuff there.

  I've seen people asking about this on the list, but I never
 encountered
  this particular exception.
 
 Lucky you...

:)

  Maybe it's not a Lucene issue then, although I've seen this
 mentioned
  so often, which means that documentation could be improved to
 prevent
  people from making the same mistakes that others have already made.
 
 Maybe, maybe not. And most likely I'm doing something odd. In any
 case, 
 could you point me to the mistakes that others have already made?
 Or 
 did I miss something obvious here?

Nah, the only thing I can suggest is check the lists' archives, that is
where mistakes of others would be recorded.

Otis


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]