Re: Experience with indexing billions of documents?

2010-04-14 Thread Jason Rutherglen
Tom, Yes, we've (Biz360) indexed 3 billion and upwards... If indexing is the issue (or rather re-indexing) we used SOLR-1301 with Hadoop to re-index efficiently (ie, in a timely manner). For querying we're currently using the out of the box Solr distributed shards query mechanism, which is hard (r

Re: Experience with indexing billions of documents?

2010-04-13 Thread Thomas Koch
Bradford Stephens: > Hey there, > > We've actually been tackling this problem at Drawn to Scale. We'd really > like to get our hands on LuceHBase to see how it scales. Our faceting still > needs to be done in-memory, which is kinda tricky, but it's worth > exploring. Hi Bradford, thank you for yo

Re: Experience with indexing billions of documents?

2010-04-13 Thread Bradford Stephens
Hey there, We've actually been tackling this problem at Drawn to Scale. We'd really like to get our hands on LuceHBase to see how it scales. Our faceting still needs to be done in-memory, which is kinda tricky, but it's worth exploring. On Mon, Apr 12, 2010 at 7:27 AM, Thomas Koch wrote: > Hi,

Re: Experience with indexing billions of documents?

2010-04-12 Thread Thomas Koch
Hi, could I interest you in this project? http://github.com/thkoch2001/lucehbase The aim is to store the index directly in HBase, a database system modelled after google's Bigtable to store data in the regions of tera or petabytes. Best regards, Thomas Koch Lance Norskog: > The 2B limitation i

Re: Experience with indexing billions of documents?

2010-04-05 Thread Lance Norskog
The 2B limitation is within one shard, due to using a signed 32-bit integer. There is no limit in that regard in sharding- Distributed Search uses the stored unique document id rather than the internal docid. On Fri, Apr 2, 2010 at 10:31 AM, Rich Cariens wrote: > A colleague of mine is using nati

Re: Experience with indexing billions of documents?

2010-04-02 Thread Rich Cariens
A colleague of mine is using native Lucene + some home-grown patches/optimizations to index over 13B small documents in a 32-shard environment, which is around 406M docs per shard. If there's a 2B doc id limitation in Lucene then I assume he's patched it himself. On Fri, Apr 2, 2010 at 1:17 PM,

Re: Experience with indexing billions of documents?

2010-04-02 Thread Peter Sturge
You can do this today with multiple indexes, replication and distributed searching. SolrCloud/clustering will certainly make life easier when it comes to managing these, but with distributed searches over multiple indexes, you're limited only by how much hardware you can throw at it. On Fri, Apr

Re: Experience with indexing billions of documents?

2010-04-02 Thread darren
My guess is that you will need to take advantage of Solr 1.5's upcoming cloud/cluster renovations and use multiple indexes to comfortably achieve those numbers. Hypthetically, in that case, you won't be limited by single index docid limitations of Lucene. > We are currently indexing 5 million book

Experience with indexing billions of documents?

2010-04-02 Thread Burton-West, Tom
We are currently indexing 5 million books in Solr, scaling up over the next few years to 20 million. However we are using the entire book as a Solr document. We are evaluating the possibility of indexing individual pages as there are some use cases where users want the most relevant pages rega