Re: index size doubled?

2004-12-21 Thread Paul Elschot
On Tuesday 21 December 2004 05:49, aurora wrote:
 I'm testing the rebuilding of the index. I add several hundred documents,  
 optimize and add another few hundred and so on. Right now I have around  
 7000 files. I observed after the index gets to certain size. Everytime  
 after optimize, the are two files roughly the same size like below:
 
 12/20/2004  01:57p  13 deletable
 12/20/2004  01:57p  29 segments
 12/20/2004  01:53p  14,460,367 _5qf.cfs
 12/20/2004  01:57p  15,069,013 _5zr.cfs
 
 The index total index is double of what I expect. This is not always  
 reproducible. (I'm constantly tuning my program and the set of document).  
 Sometime I get a decent single document after optimize. What was happening?

Lucene tried to delete the older version (_5cf.cfs above), but got an error
back from the file system. After that it has put the name of that segment in
the deletable file, so it can try later to delete that segment.

This is known behaviour on FAT file systems. These randomly take some time
for themselves to finish closing a file after it has been correctly closed by
a program.

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: index size doubled?

2004-12-21 Thread Otis Gospodnetic
Another possibility is that you are using an older version of Lucene,
which was known to have a bug with similar symptoms.  Get the latest
version of Lucene.

You shouldn't really have multiple .cfs files after optimizing your
index.  Also, optimize only at the end, if you care about indexing
speed.

Otis

--- Paul Elschot [EMAIL PROTECTED] wrote:

 On Tuesday 21 December 2004 05:49, aurora wrote:
  I'm testing the rebuilding of the index. I add several hundred
 documents,  
  optimize and add another few hundred and so on. Right now I have
 around  
  7000 files. I observed after the index gets to certain size.
 Everytime  
  after optimize, the are two files roughly the same size like below:
  
  12/20/2004  01:57p  13 deletable
  12/20/2004  01:57p  29 segments
  12/20/2004  01:53p  14,460,367 _5qf.cfs
  12/20/2004  01:57p  15,069,013 _5zr.cfs
  
  The index total index is double of what I expect. This is not
 always  
  reproducible. (I'm constantly tuning my program and the set of
 document).  
  Sometime I get a decent single document after optimize. What was
 happening?
 
 Lucene tried to delete the older version (_5cf.cfs above), but got an
 error
 back from the file system. After that it has put the name of that
 segment in
 the deletable file, so it can try later to delete that segment.
 
 This is known behaviour on FAT file systems. These randomly take some
 time
 for themselves to finish closing a file after it has been correctly
 closed by
 a program.
 
 Regards,
 Paul Elschot
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: index size doubled?

2004-12-21 Thread aurora
Thanks for the heads up. I'm using Lucene 1.4.2.
I tried to do optimize() again but it has no effect. Adding a just tiny  
dummy document would get rid of it.

I'm doing optimize every few hundred documents because I tried to simulate  
incremental update. This lead to another question I would post separately.

Thanks.

Another possibility is that you are using an older version of Lucene,
which was known to have a bug with similar symptoms.  Get the latest
version of Lucene.
You shouldn't really have multiple .cfs files after optimizing your
index.  Also, optimize only at the end, if you care about indexing
speed.
Otis
--- Paul Elschot [EMAIL PROTECTED] wrote:
On Tuesday 21 December 2004 05:49, aurora wrote:
 I'm testing the rebuilding of the index. I add several hundred
documents,
 optimize and add another few hundred and so on. Right now I have
around
 7000 files. I observed after the index gets to certain size.
Everytime
 after optimize, the are two files roughly the same size like below:

 12/20/2004  01:57p  13 deletable
 12/20/2004  01:57p  29 segments
 12/20/2004  01:53p  14,460,367 _5qf.cfs
 12/20/2004  01:57p  15,069,013 _5zr.cfs

 The index total index is double of what I expect. This is not
always
 reproducible. (I'm constantly tuning my program and the set of
document).
 Sometime I get a decent single document after optimize. What was
happening?
Lucene tried to delete the older version (_5cf.cfs above), but got an
error
back from the file system. After that it has put the name of that
segment in
the deletable file, so it can try later to delete that segment.
This is known behaviour on FAT file systems. These randomly take some
time
for themselves to finish closing a file after it has been correctly
closed by
a program.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


index size doubled?

2004-12-20 Thread aurora
I'm testing the rebuilding of the index. I add several hundred documents,  
optimize and add another few hundred and so on. Right now I have around  
7000 files. I observed after the index gets to certain size. Everytime  
after optimize, the are two files roughly the same size like below:

12/20/2004  01:57p  13 deletable
12/20/2004  01:57p  29 segments
12/20/2004  01:53p  14,460,367 _5qf.cfs
12/20/2004  01:57p  15,069,013 _5zr.cfs
The index total index is double of what I expect. This is not always  
reproducible. (I'm constantly tuning my program and the set of document).  
Sometime I get a decent single document after optimize. What was happening?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]