Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Santi Saez
Hi, I'm doing some initial tests with CouchDB, trying to store 2^32 IP addresses (approximately 4.3 billions of documents). Documents have only required fields: _id and _rev, but I've noticed that the minimum space occupied by each document is approximately 3.7KB, so I need +14TB disk

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Robert Newson
Try database compaction? B. On Mon, Feb 1, 2010 at 4:27 PM, Santi Saez santis...@woop.es wrote: Hi, I'm doing some initial tests with CouchDB, trying to store 2^32 IP addresses (approximately 4.3 billions of documents). Documents have only required fields: _id and _rev, but I've noticed

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Elf
Did you plan to handle IPv6 in future versions of your program? :) 2010/2/1 Santi Saez santis...@woop.es: Hi, I'm doing some initial tests with CouchDB, trying to store 2^32 IP addresses (approximately 4.3 billions of documents). Documents have only required fields: _id and _rev, but I've

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Santi Saez
El 01/02/10 17:31, Robert Newson escribió: Try database compaction? I have tried database compaction in another testing server (Debian Lenny box) using CouchDB 0.8.0-2, and after database compaction disk size is the same: # curl http://localhost:5984/test

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Robert Newson
compaction should reduce disk usage even without updates or deletes, but that is probably not true for 0.8. odd that you get the exact same byte count after compaction... On Mon, Feb 1, 2010 at 4:52 PM, Santi Saez santis...@woop.es wrote: El 01/02/10 17:31, Robert Newson escribió: Try database

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Santi Saez
El 01/02/10 17:32, Elf escribió: Did you plan to handle IPv6 in future versions of your program? :) It would be another great test.. but using CouchDB, perhaps I will not have enough disk space ;-P Now seriously: any idea to reduce disk space in this test to store 2^32 documents? thanks!

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Markus Jelsma
Not really, but you could omit about 300 million IP addresses, these are multicast and private network addresses, that'd save you about 1.2GiB already. Now seriously: any idea to reduce disk space in this test to store 2^32 documents? thanks! Markus Jelsma - Technisch Architect - Buyways BV

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Santi Saez
El 01/02/10 17:56, Robert Newson escribió: compaction should reduce disk usage even without updates or deletes, but that is probably not true for 0.8. odd that you get the exact same byte count after compaction... In another testing server with CentOS-5 and couchdb-0.10.0-1.el5, we have: #

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Santi Saez
El 01/02/10 18:19, Markus Jelsma escribió: Not really, but you could omit about 300 million IP addresses, these are multicast and private network addresses, that'd save you about 1.2GiB already. Thanks for the tip ;-) Regards, -- Santi Saez http://woop.es

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Santi Saez
El 01/02/10 17:56, Paul Davis escribió: Dear Paul, Well, 2^32 of anything is 4GiB per byte stored. So, minimum of four bytes and you're at 16GiB. Even with just 1KiB overhead you're at 4TiB. I'm left wondering why you would want to store a list of numbers in the first place. Imagine a

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Paul Davis
On Mon, Feb 1, 2010 at 1:50 PM, Santi Saez santis...@woop.es wrote: El 01/02/10 17:56, Paul Davis escribió: Dear Paul, Well, 2^32 of anything is 4GiB per byte stored. So, minimum of four bytes and you're at 16GiB. Even with just 1KiB overhead you're at 4TiB. I'm left wondering why you

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Nicholas Orr
Also have a look at this thread http://mail-archives.apache.org/mod_mbox/couchdb-dev/201001.mbox/%3chi57et$19...@ger.gmane.org%3e On Tue, Feb 2, 2010 at 6:07 AM, Paul Davis paul.joseph.da...@gmail.comwrote: On Mon, Feb 1, 2010 at 1:50 PM, Santi Saez santis...@woop.es wrote: El 01/02/10

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Brian Candler
On Mon, Feb 01, 2010 at 07:50:00PM +0100, Santi Saez wrote: El 01/02/10 17:56, Paul Davis escribió: Dear Paul, Well, 2^32 of anything is 4GiB per byte stored. So, minimum of four bytes and you're at 16GiB. Even with just 1KiB overhead you're at 4TiB. I'm left wondering why you would

Re: Best way to store 2^32 IPs in CouchDB

2010-02-01 Thread Stephen Day
I thought I'd weigh in on this to illustrate the differences in the use cases between heterogeneous document based data vs homogeneous data, such as IP address adjacencies. I have a bit of a networking background, so if I am way off here in your intent, this may at least be an interesting set of