Re: Disk space used by optimize

2005-02-06 Thread Morus Walter
Bernhard Messer writes:
> 
> >However, three times the space sounds a bit too much, or I make a
> >mistake in the book. :)
> >  
> >
> there already was  a discussion about disk usage during index optimize. 
> Please have a look to the developers list at: 
> http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1797569 
> 
> where i made some measurements about the disk usage within lucene.
> At that time i proposed a patch which was reducing disk total used disk 
> size from 3 times to a little more than 2 times of the final index size. 
> Together with Christoph we implemented some improvements to the 
> optimization patch and finally commit the changes.
> 
Hmm. In the case that the index is used (open reader), I doubt your patch 
makes a difference. In that case the disk space used by the non optimized 
index will still be used even if the files are deleted (on unix/linux).
What happens, if disk space run's out during creation of the compound index?
Will the non compound files be a usable index?
Otherwise you risk to loose the index.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-02-04 Thread Bernhard Messer

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
 

there already was  a discussion about disk usage during index optimize. 
Please have a look to the developers list at: 
http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1797569 

where i made some measurements about the disk usage within lucene.
At that time i proposed a patch which was reducing disk total used disk 
size from 3 times to a little more than 2 times of the final index size. 
Together with Christoph we implemented some improvements to the 
optimization patch and finally commit the changes.

Bernhard
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Disk space used by optimize - non space in disk corrupts index.

2005-02-04 Thread Ernesto De Santis
Hi all
We have a big index and a little space in disk.
When optimize and all space is consumed, our index is corrupted.
segments file point to nonexistent files.
Enviroment:
java 1.4.2_04
W2000 SP4
Tomat 5.5.4
Bye,
Ernesto.
Yura Smolsky escribió:
Hello, Otis.
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice as much
disk space.
OG> Have you tried using the multifile index format?  Now I wonder if there
OG> is actually a difference in disk space cosumed by optimize() when you
OG> use multifile and compound index format...
OG> Otis
OG> --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
 

Our copy of LIA is "in the mail" ;)
Yes the final three files are: the .cfs (46.8MB), deletable (4
bytes),
and segments (29 bytes).
--Leto

 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 

Hello,
Yes, that is how optimize works - copies all existing index 
segments into one unified index segment, thus optimizing it.

see hit #1:
   

http://www.lucenebook.com/search?query=optimize+disk+space
 

However, three times the space sounds a bit too much, or I 
make a mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?
Otis
--- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
   

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about 
 

three times 
   

the size before finally compressing?
In our case the optimise grinds the disk, expanding the index
 

into 
 

many files of about 145MB total, before compressing down to three
 

files of about 47MB total.  That must be a lot of disk activity
 

for 
 

the people with multi-gigabyte indexes!
Regards,
Leto
 

CONFIDENTIALITY NOTICE AND DISCLAIMER
Information in this transmission is intended only for the person(s)
to whom it is addressed and may contain privileged and/or
confidential information. If you are not the intended recipient, any
disclosure, copying or dissemination of the information is
unauthorised and you should delete/destroy all copies and notify the
sender. No liability is accepted for any unauthorised use of the
information contained in this transmission.
This disclaimer has been automatically added.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
 


OG> -
OG> To unsubscribe, e-mail: [EMAIL PROTECTED]
OG> For additional commands, e-mail:
OG> [EMAIL PROTECTED]
Yura Smolsky,

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re[2]: Disk space used by optimize

2005-02-04 Thread Yura Smolsky
Hello, Doug.

>> There is a big difference when you use compound index format or
>> multiple files. I have tested it on the big index (45 Gb). When I used
>> compound file then optimize takes 3 times more space, b/c *.cfs needs
>> to be unpacked.
>> 
>> Now I do use non compound file format. It needs like twice as much
>> disk space.
DC> Perhaps we should add something to the javadocs noting this?

Sure. I was a bit confused about optimizing compound file format b/c I
had not info about space usage when optimizing.
More info in the javadocs will save somebody's time :)


Yura Smolsky




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-31 Thread Doug Cutting
Yura Smolsky wrote:
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice as much
disk space.
Perhaps we should add something to the javadocs noting this?
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re[2]: Disk space used by optimize

2005-01-30 Thread Yura Smolsky
Hello, Otis.

There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.

Now I do use non compound file format. It needs like twice as much
disk space.

OG> Have you tried using the multifile index format?  Now I wonder if there
OG> is actually a difference in disk space cosumed by optimize() when you
OG> use multifile and compound index format...

OG> Otis

OG> --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:

>> Our copy of LIA is "in the mail" ;)
>> 
>> Yes the final three files are: the .cfs (46.8MB), deletable (4
>> bytes),
>> and segments (29 bytes).
>> 
>> --Leto
>> 
>> 
>> 
>> > -Original Message-
>> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
>> > 
>> > Hello,
>> > 
>> > Yes, that is how optimize works - copies all existing index 
>> > segments into one unified index segment, thus optimizing it.
>> > 
>> > see hit #1:
>> http://www.lucenebook.com/search?query=optimize+disk+space
>> > 
>> > However, three times the space sounds a bit too much, or I 
>> > make a mistake in the book. :)
>> > 
>> > You said you end up with 3 files - .cfs is one of them, right?
>> > 
>> > Otis
>> > 
>> > 
>> > --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
>> > 
>> > > 
>> > > Just a quick question:  after writing an index and then calling
>> > > optimize(), is it normal for the index to expand to about 
>> > three times 
>> > > the size before finally compressing?
>> > > 
>> > > In our case the optimise grinds the disk, expanding the index
>> into 
>> > > many files of about 145MB total, before compressing down to three
>> 
>> > > files of about 47MB total.  That must be a lot of disk activity
>> for 
>> > > the people with multi-gigabyte indexes!
>> > > 
>> > > Regards,
>> > > Leto
>> 
>> CONFIDENTIALITY NOTICE AND DISCLAIMER
>> 
>> Information in this transmission is intended only for the person(s)
>> to whom it is addressed and may contain privileged and/or
>> confidential information. If you are not the intended recipient, any
>> disclosure, copying or dissemination of the information is
>> unauthorised and you should delete/destroy all copies and notify the
>> sender. No liability is accepted for any unauthorised use of the
>> information contained in this transmission.
>> 
>> This disclaimer has been automatically added.
>> 
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail:
>> [EMAIL PROTECTED]
>> 
>> 


OG> -
OG> To unsubscribe, e-mail: [EMAIL PROTECTED]
OG> For additional commands, e-mail:
OG> [EMAIL PROTECTED]


Yura Smolsky,




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-28 Thread Otis Gospodnetic
Morus,

that description of 3 sets of index files is what I was imagining, too.
 I'll have to test and add to the book errata, it seems.

Thanks for the info,
Otis

--- Morus Walter <[EMAIL PROTECTED]> wrote:

> Otis Gospodnetic writes:
> > Hello,
> > 
> > Yes, that is how optimize works - copies all existing index
> segments
> > into one unified index segment, thus optimizing it.
> > 
> > see hit #1:
> http://www.lucenebook.com/search?query=optimize+disk+space
> > 
> > However, three times the space sounds a bit too much, or I make a
> > mistake in the book. :)
> > 
> I cannot explain why, but ~ three times the size of the final index
> is
> what I observed, when I logged disk usage during optimize of an index
> in compound index format.
> The test was on linux, I simply did a 'du -s' every few seconds
> parallel 
> to the optimize.
> I didn't test noncompund format. Probably optimizing a compund format
> requires to store the different parts of the compound file separately
> before joining them to the compound file (sound reasonable, otherwise
> you would need to know the sizes before creating the parts). In that
> case 
> you had the original index, the separate files and the new compound
> file 
> as the disk usage peak.
> 
> So IMHO the book is wrong.
> 
> Morus
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-28 Thread Morus Walter
Otis Gospodnetic writes:
> Hello,
> 
> Yes, that is how optimize works - copies all existing index segments
> into one unified index segment, thus optimizing it.
> 
> see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
> 
> However, three times the space sounds a bit too much, or I make a
> mistake in the book. :)
> 
I cannot explain why, but ~ three times the size of the final index is
what I observed, when I logged disk usage during optimize of an index
in compound index format.
The test was on linux, I simply did a 'du -s' every few seconds parallel 
to the optimize.
I didn't test noncompund format. Probably optimizing a compund format
requires to store the different parts of the compound file separately
before joining them to the compound file (sound reasonable, otherwise
you would need to know the sizes before creating the parts). In that case 
you had the original index, the separate files and the new compound file 
as the disk usage peak.

So IMHO the book is wrong.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Have you tried using the multifile index format?  Now I wonder if there
is actually a difference in disk space cosumed by optimize() when you
use multifile and compound index format...

Otis

--- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:

> Our copy of LIA is "in the mail" ;)
> 
> Yes the final three files are: the .cfs (46.8MB), deletable (4
> bytes),
> and segments (29 bytes).
> 
> --Leto
> 
> 
> 
> > -Original Message-
> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
> > 
> > Hello,
> > 
> > Yes, that is how optimize works - copies all existing index 
> > segments into one unified index segment, thus optimizing it.
> > 
> > see hit #1:
> http://www.lucenebook.com/search?query=optimize+disk+space
> > 
> > However, three times the space sounds a bit too much, or I 
> > make a mistake in the book. :)
> > 
> > You said you end up with 3 files - .cfs is one of them, right?
> > 
> > Otis
> > 
> > 
> > --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > Just a quick question:  after writing an index and then calling 
> > > optimize(), is it normal for the index to expand to about 
> > three times 
> > > the size before finally compressing?
> > > 
> > > In our case the optimise grinds the disk, expanding the index
> into 
> > > many files of about 145MB total, before compressing down to three
> 
> > > files of about 47MB total.  That must be a lot of disk activity
> for 
> > > the people with multi-gigabyte indexes!
> > > 
> > > Regards,
> > > Leto
> 
> CONFIDENTIALITY NOTICE AND DISCLAIMER
> 
> Information in this transmission is intended only for the person(s)
> to whom it is addressed and may contain privileged and/or
> confidential information. If you are not the intended recipient, any
> disclosure, copying or dissemination of the information is
> unauthorised and you should delete/destroy all copies and notify the
> sender. No liability is accepted for any unauthorised use of the
> information contained in this transmission.
> 
> This disclaimer has been automatically added.
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Our copy of LIA is "in the mail" ;)

Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes),
and segments (29 bytes).

--Leto



> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
> 
> Hello,
> 
> Yes, that is how optimize works - copies all existing index 
> segments into one unified index segment, thus optimizing it.
> 
> see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
> 
> However, three times the space sounds a bit too much, or I 
> make a mistake in the book. :)
> 
> You said you end up with 3 files - .cfs is one of them, right?
> 
> Otis
> 
> 
> --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Just a quick question:  after writing an index and then calling 
> > optimize(), is it normal for the index to expand to about 
> three times 
> > the size before finally compressing?
> > 
> > In our case the optimise grinds the disk, expanding the index into 
> > many files of about 145MB total, before compressing down to three 
> > files of about 47MB total.  That must be a lot of disk activity for 
> > the people with multi-gigabyte indexes!
> > 
> > Regards,
> > Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Hello,

Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.

see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?

Otis


--- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:

> 
> Just a quick question:  after writing an index and then calling
> optimize(), is it normal for the index to expand to about three times
> the size before finally compressing?
> 
> In our case the optimise grinds the disk, expanding the index into
> many
> files of about 145MB total, before compressing down to three files of
> about 47MB total.  That must be a lot of disk activity for the people
> with multi-gigabyte indexes!
> 
> Regards,
> Leto
> 
> CONFIDENTIALITY NOTICE AND DISCLAIMER
> 
> Information in this transmission is intended only for the person(s)
> to whom it is addressed and may contain privileged and/or
> confidential information. If you are not the intended recipient, any
> disclosure, copying or dissemination of the information is
> unauthorised and you should delete/destroy all copies and notify the
> sender. No liability is accepted for any unauthorised use of the
> information contained in this transmission.
> 
> This disclaimer has been automatically added.
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Disk space used by optimize

2005-01-27 Thread Kauler, Leto S

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about three times
the size before finally compressing?

In our case the optimise grinds the disk, expanding the index into many
files of about 145MB total, before compressing down to three files of
about 47MB total.  That must be a lot of disk activity for the people
with multi-gigabyte indexes!

Regards,
Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]