Re: 1.4.x TermInfosWriter.indexInterval not public static ?

2005-02-28 Thread Doug Cutting
Chris Hostetter wrote:
 1) If making it mutatable requires changes to other classes to propogate
it, then why is it now an instance variable instead of a static?
(Presumably making it an instance variable allows subclasses to
override the value, but if other classes have internal expectations
of the value, that doesn't seem safe)
Its an instance variable because it can vary from instance-to-instance. 
 This value is specified when an index segment is written, and 
subsequently read from disk and used when reading that segment.  It's an 
instance variable in both the writing and reading code.  The thing 
that's lacking is a way to pass in alternate values to the writing code.

The reason that other classes are involved is that the reading and 
writing code are in non-public classes.  We don't want to expose the 
implementation too much by making these public, but would rather expose 
these as getter/setter methods on the relevant public API.

 2) Should it be configurable through a get/set method, or through a
system property?
(which rehashes the instance/global question)
That's indeed the question.  My guess is that a system property would be 
probably be sufficient for most, but perhaps not for all.  Similarly 
with a static setter/getter.  But a getter/setter on IndexWriter would 
make everyone happy.

 3) Is it important that a writer updating an existing index use the same
value as the writer that initial created the index?  if so should
there really be a preferedIndexInterval variable which is mutatable,
and a currentIndexInterval which is set to the value of the index
currently being updated.  Such that preferedIndexInterval is used when
making an index from scratch and currentIndexInterval is used when
adding segments to a new index?
It's used whenever an index segment is created.  Index segments are 
created when documents are added and when index segments are merged to 
form larger index segments.  Merging happens frequently while indexing. 
 Optimization merges all segments.

The value can vary in each segment.
The default value is probably good for all but folks with very large 
indexes, who may wish to increase the default somewhat.  Also folks with 
smaller indexes and very high query volumes may wish to decrease the 
default.  It's a classic time/memory tradeoff.  Higher values use less 
memory and make searches a bit slower, smaller values use more memory 
and make searches a bit faster.

Unless there are objections I will add this as:
  IndexWriter.setTermIndexInterval()
  IndexWriter.getTermIndexInterval()
Both will be marked Expert.
Further discussion should move to the lucene-dev list.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: 1.4.x TermInfosWriter.indexInterval not public static ?

2005-02-25 Thread Doug Cutting
Kevin A. Burton wrote:
Whats the desired pattern of using of TermInfosWriter.indexInterval ?
There isn't one.  It is not a part of the public API.  It is an 
unsupported internal feature.

Do I have to compile my own version of Lucene to change this?
Yes.
The last 
API was public static final but this is not public nor static.
It was never public.  It used to be static and final, but is now an 
instance variable.

I'm wondering if we should just make this a value that can be set at 
runtime.  Considering the memory savings for larger installs this 
can/will be important.
The place to put getter/setters would be IndexWriter, since that's the 
public home of all other index parameters.  Some changes to 
DocumentWriter and SegmentMerger would be required to pass this value 
through to TermInfosWriter from IndexWriter.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: 1.4.x TermInfosWriter.indexInterval not public static ?

2005-02-25 Thread Chris Hostetter
:  Whats the desired pattern of using of TermInfosWriter.indexInterval ?
:
: There isn't one.  It is not a part of the public API.  It is an
: unsupported internal feature.

: It was never public.  It used to be static and final, but is now an
: instance variable.

: The place to put getter/setters would be IndexWriter, since that's the
: public home of all other index parameters.  Some changes to
: DocumentWriter and SegmentMerger would be required to pass this value
: through to TermInfosWriter from IndexWriter.

I don't really understand what this variable does, but from what I do
understand: changing it's value can have significant performance impacts
depending on the nature of the data being indexed.  That leads me to
belive3 that making it configurale would be a good idea, but it begs a
some questions:

 1) If making it mutatable requires changes to other classes to propogate
it, then why is it now an instance variable instead of a static?
(Presumably making it an instance variable allows subclasses to
override the value, but if other classes have internal expectations
of the value, that doesn't seem safe)

 2) Should it be configurable through a get/set method, or through a
system property?
(which rehashes the instance/global question)

 3) Is it important that a writer updating an existing index use the same
value as the writer that initial created the index?  if so should
there really be a preferedIndexInterval variable which is mutatable,
and a currentIndexInterval which is set to the value of the index
currently being updated.  Such that preferedIndexInterval is used when
making an index from scratch and currentIndexInterval is used when
adding segments to a new index?



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



1.4.x TermInfosWriter.indexInterval not public static ?

2005-02-24 Thread Kevin A. Burton
Whats the desired pattern of using of TermInfosWriter.indexInterval ?
Do I have to compile my own version of Lucene to change this?   The last 
API was public static final but this is not public nor static. 

I'm wondering if we should just make this a value that can be set at 
runtime.  Considering the memory savings for larger installs this 
can/will be important.

Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
   
Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]