Hi All,
I too would like to have doc'ids that are larger than int32. Not today
but in 4 years that would be very nice ;) Already we are splitting some
indexes that would be nicer together (mostly allowing more lucene code
to be used instead of our own)
On the other hand we are not the defaul
maybe using TopDocs.merge you can the same query on multiple indexes, with
multireader you can also to make join operation on different indexes
2016-08-21 19:31 GMT+02:00 Cristian Lorenzetto <
cristian.lorenze...@gmail.com>:
> i m overviewing TopDocs.merge.
>
> What is the difference to use multi
i m overviewing TopDocs.merge.
What is the difference to use multiple SearchIndexer and then to use
TopDocs or to use MultiReader?
2016-08-21 2:28 GMT+02:00 Cristian Lorenzetto :
> For my opinion this study dont tell any thing more than before. Obviously
> if you try to retrieve all data store i
For my opinion this study dont tell any thing more than before. Obviously if
you try to retrieve all data store in a single query the performance will be
not good. Lucene is fantastic But no magic. The physic laws continue to work
also with lucene. The query is designed for retrieving a small pa
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Cristian Lorenzetto [mailto:cristian.lorenze...@gmail.com]
> Sent: Thursday, August 18, 2016 5:58 PM
> To: Lucene Users
> Subject: Re: docid is just a signed int32
&
I was referring to memory (RAM).
We have machines running right now with 1TB _RAM_ and will be getting
machines with 3TB RAM (Dell R830 with 48 64GM DIMMs) (Sorry, I was
incorrect when I said we were running the 3TB machines _now_).
Glen
On Fri, Aug 19, 2016 at 9:56 AM, Cristian Lorenzetto <
c
ah :)
"with 3TB of ram (we have these running), int64 for >2^32 documents in a
single index should not be a problem"
Maybe i m reasoning in bad way but normally the size of storage is not
the size of memory.
I dont know lucene in the deep, but i would aspect lucene index is
scanning a block step
Making docid an int64 is a non-trivial undertaking, and this work needs to
be compared against the use cases and how compelling they are.
That said, in the lifetime of most software projects a decision is made to
break backward compatibility to move the project forward.
When/if moving to int64 hap
Le ven. 19 août 2016 à 03:32, Trejkaz a écrit :
> But hang on:
> * TopDocs#merge still returns a TopDocs.
> * TopDocs still uses an array of ScoreDoc.
> * ScoreDoc still uses an int doc ID.
>
This is why ScoreDoc has a `shardId` so that you can know which index a
document comes from.
I'm not sa
OK, I'm a little out of my league here, but I'll plow on anyway
bq: There are use cases out there where >2^31 does make sense in a single index
Ok, let's put some definition to this and define the use-case
specifically rather than
be vague. I've just run an experiment for instance where I had
On Thu, Aug 18, 2016 at 11:55 PM, Adrien Grand wrote:
> No, IndexWriter enforces that the number of documents cannot go over
> IndexWriter.MAX_DOCS (which is a bit less than 2^31) and
> BaseCompositeReader computes the number of documents in a long variable and
> ensures it is less than 2^31, so y
normally databases supports at least long primary key.
try to ask to twitter application , for example increasing every year more
than 4 petabytes :) Maybe they use big storage devices bigger than a pc
storage:)
However If you offer a possibility to use shards ... it is a possibility
anyway :)
For
What are you trying to index that has more than 3 billion documents per
shard / index and can not be split as Adrien suggests?
On Thu, Aug 18, 2016, at 07:35 AM, Cristian Lorenzetto wrote:
> Maybe lucene has maxsize 2^31 because result set are java array where
> length is a int type.
> A suggest
Maybe lucene has maxsize 2^31 because result set are java array where
length is a int type.
A suggestion for possible changes in future is to not use java array but
Iterator. Iterator is a ADT more scalable , not sucking memory for
returning documents.
2016-08-18 16:03 GMT+02:00 Glen Newton :
>
Or maybe it is time Lucene re-examined this limit.
There are use cases out there where >2^31 does make sense in a single index
(huge number of tiny docs).
Also, I think the underlying hardware and the JDK have advanced to make
this more defendable.
Constructively,
Glen
On Thu, Aug 18, 2016 at
No, IndexWriter enforces that the number of documents cannot go over
IndexWriter.MAX_DOCS (which is a bit less than 2^31) and
BaseCompositeReader computes the number of documents in a long variable and
ensures it is less than 2^31, so you cannot have indexes that contain more
than 2^31 documents.
16 matches
Mail list logo