2,147,483,647 max documents?

2003-08-11 Thread Kevin A. Burton
Why was an int chosen to represent document handles?  Is there a reason 
for this?  Why wasn't a long chosen to represent document handles?  64 
bits seems like the obvious choice here except for a potentially bloated 
datastore (32 extra bits)

I guess one possible solution is to use multiple indexes.  This way you 
could run the search on each index and build a 64bit handle with the 
first 32bits being the index handle and the second 32bits being the 
local handle.

Kevin

--
Help Support NewsMonster Development!  Purchase NewsMonster PRO!
   http://www.newsmonster.org/download-pro.html

Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM - sfburtonator,  Web - http://www.peerfear.org/
GPG fingerprint: 4D20 40A0 C734 307E C7B4  DCAA 0303 3AC5 BD9D 7C4D
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: 2,147,483,647 max documents?

2003-08-11 Thread Tatu Saloranta
On Monday 11 August 2003 01:07, Kevin A. Burton wrote:
 Why was an int chosen to represent document handles?  Is there a reason
 for this?  Why wasn't a long chosen to represent document handles?  64
 bits seems like the obvious choice here except for a potentially bloated
 datastore (32 extra bits)

I can't speak for actual reasons (not being core Lucene developer), but the
general benefits of 32-bit ints vs. longs are:

- Better performance on pretty much any current architecture (even so-called
  64-bit CPUs often prefer 32-bit data access, and 64-bit representations are
  more important for addressing).
  Also, smaller data set size is usually also good for performance (caching).
- Atomicity of access (read access can often be done without synchronizing);
  longs can not be atomically accessed in Java.

Another question is whether limited address space presents a real problem. 
Since Lucene can reuse doc ids (or rather, there is not persistent id per se? 
doc id is just an index, and holes left by removed docs can be reused?), 
perhaps this is usually not much of an issue?

-+ Tatu +-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]