Re: docid is just a signed int32

2017-04-06 Thread Jerven Tjalling Bolleman
Hi All, I too would like to have doc'ids that are larger than int32. Not today but in 4 years that would be very nice ;) Already we are splitting some indexes that would be nicer together (mostly allowing more lucene code to be used instead of our own) On the other hand we are not the defaul

Re: docid is just a signed int32

2016-08-21 Thread Cristian Lorenzetto
maybe using TopDocs.merge you can the same query on multiple indexes, with multireader you can also to make join operation on different indexes 2016-08-21 19:31 GMT+02:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > i m overviewing TopDocs.merge. > > What is the difference to use multi

Re: docid is just a signed int32

2016-08-21 Thread Cristian Lorenzetto
i m overviewing TopDocs.merge. What is the difference to use multiple SearchIndexer and then to use TopDocs or to use MultiReader? 2016-08-21 2:28 GMT+02:00 Cristian Lorenzetto : > For my opinion this study dont tell any thing more than before. Obviously > if you try to retrieve all data store i

Re: docid is just a signed int32

2016-08-20 Thread Cristian Lorenzetto
For my opinion this study dont tell any thing more than before. Obviously if you try to retrieve all data store in a single query the performance will be not good. Lucene is fantastic But no magic. The physic laws continue to work also with lucene. The query is designed for retrieving a small pa

RE: docid is just a signed int32

2016-08-19 Thread Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Cristian Lorenzetto [mailto:cristian.lorenze...@gmail.com] > Sent: Thursday, August 18, 2016 5:58 PM > To: Lucene Users > Subject: Re: docid is just a signed int32 &

Re: docid is just a signed int32

2016-08-19 Thread Glen Newton
I was referring to memory (RAM). We have machines running right now with 1TB _RAM_ and will be getting machines with 3TB RAM (Dell R830 with 48 64GM DIMMs) (Sorry, I was incorrect when I said we were running the 3TB machines _now_). Glen On Fri, Aug 19, 2016 at 9:56 AM, Cristian Lorenzetto < c

Re: docid is just a signed int32

2016-08-19 Thread Cristian Lorenzetto
ah :) "with 3TB of ram (we have these running), int64 for >2^32 documents in a single index should not be a problem" Maybe i m reasoning in bad way but normally the size of storage is not the size of memory. I dont know lucene in the deep, but i would aspect lucene index is scanning a block step

Re: docid is just a signed int32

2016-08-19 Thread Glen Newton
Making docid an int64 is a non-trivial undertaking, and this work needs to be compared against the use cases and how compelling they are. That said, in the lifetime of most software projects a decision is made to break backward compatibility to move the project forward. When/if moving to int64 hap

Re: docid is just a signed int32

2016-08-19 Thread Adrien Grand
Le ven. 19 août 2016 à 03:32, Trejkaz a écrit : > But hang on: > * TopDocs#merge still returns a TopDocs. > * TopDocs still uses an array of ScoreDoc. > * ScoreDoc still uses an int doc ID. > This is why ScoreDoc has a `shardId` so that you can know which index a document comes from. I'm not sa

Re: docid is just a signed int32

2016-08-18 Thread Erick Erickson
OK, I'm a little out of my league here, but I'll plow on anyway bq: There are use cases out there where >2^31 does make sense in a single index Ok, let's put some definition to this and define the use-case specifically rather than be vague. I've just run an experiment for instance where I had

Re: docid is just a signed int32

2016-08-18 Thread Trejkaz
On Thu, Aug 18, 2016 at 11:55 PM, Adrien Grand wrote: > No, IndexWriter enforces that the number of documents cannot go over > IndexWriter.MAX_DOCS (which is a bit less than 2^31) and > BaseCompositeReader computes the number of documents in a long variable and > ensures it is less than 2^31, so y

Re: docid is just a signed int32

2016-08-18 Thread Cristian Lorenzetto
normally databases supports at least long primary key. try to ask to twitter application , for example increasing every year more than 4 petabytes :) Maybe they use big storage devices bigger than a pc storage:) However If you offer a possibility to use shards ... it is a possibility anyway :) For

Re: docid is just a signed int32

2016-08-18 Thread Greg Bowyer
What are you trying to index that has more than 3 billion documents per shard / index and can not be split as Adrien suggests? On Thu, Aug 18, 2016, at 07:35 AM, Cristian Lorenzetto wrote: > Maybe lucene has maxsize 2^31 because result set are java array where > length is a int type. > A suggest

Re: docid is just a signed int32

2016-08-18 Thread Cristian Lorenzetto
Maybe lucene has maxsize 2^31 because result set are java array where length is a int type. A suggestion for possible changes in future is to not use java array but Iterator. Iterator is a ADT more scalable , not sucking memory for returning documents. 2016-08-18 16:03 GMT+02:00 Glen Newton : >

Re: docid is just a signed int32

2016-08-18 Thread Glen Newton
Or maybe it is time Lucene re-examined this limit. There are use cases out there where >2^31 does make sense in a single index (huge number of tiny docs). Also, I think the underlying hardware and the JDK have advanced to make this more defendable. Constructively, Glen On Thu, Aug 18, 2016 at

Re: docid is just a signed int32

2016-08-18 Thread Adrien Grand
No, IndexWriter enforces that the number of documents cannot go over IndexWriter.MAX_DOCS (which is a bit less than 2^31) and BaseCompositeReader computes the number of documents in a long variable and ensures it is less than 2^31, so you cannot have indexes that contain more than 2^31 documents.