Hi Jim You actually made me think about the schema Michel and I are using:
> And KVP is? ;) CREATE TABLE mykvpstore( id bigint PRIMARY KEY ) CREATE TABLE kvp ( id bigint REFERENCES mykvpstore(id), key text NOT NULL, value text, ); -- with index on key And the table with the associative array type (hstore) is: CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps hstore NOT NULL ); -- with GIST index on obj It seems to me that in the mykvpstore-kvp there is also some overhead. And yes, we have no clue what keys to anticipate, except for some common ones like 'name': The use case is coming from OpenStreetMap (http://wiki.openstreetmap.org/wiki/Database_schema ). Yours, Stefan 2011/5/17 Jim Nasby <j...@nasby.net>: > On May 16, 2011, at 8:47 AM, Merlin Moncure wrote: >> On Sat, May 14, 2011 at 5:10 AM, Stefan Keller <sfkel...@gmail.com> wrote: >>> Hi, >>> >>> I am conducting a benchmark to compare KVP table vs. hstore and got >>> bad hstore performance results when the no. of records is greater than >>> about 500'000. >>> >>> CREATE TABLE kvp ( id SERIAL PRIMARY KEY, key text NOT NULL, value text ); >>> -- with index on key >>> CREATE TABLE myhstore ( id SERIAL PRIMARY KEY, obj hstore NOT NULL ); >>> -- with GIST index on obj >>> >>> Does anyone have experience with that? >> >> hstore is not really designed for large-ish sets like that. > > And KVP is? ;) > > IIRC hstore ends up just storing everything as text, with pointers to know > where things start and end. There's no real indexing inside hstore, so > basically the only thing it can do is scan the entire hstore. > > That said, I would strongly reconsider using KVP for anything except the most > trivial of data sets. It is *extremely* inefficient. Do you really have > absolutely no idea what *any* of your keys will be? Even if you need to > support a certain amount of non-deterministic stuff, I would put everything > you possibly can into real fields and only use KVP or hstore for things that > you really didn't anticipate. > > Keep in mind that for every *value*, your overhead is 24 bytes for the heap > header, 2+ varlena bytes in the heap, plus the length of the key. In the > index you're looking at 6+ bytes of overhead, 1+ byte for varlena, plus the > length of the key. The PK will cost you an additional 16-24 bytes, depending > on alignment. So that's a *minimum* of ~50 bytes per value, and realistically > the overhead will be closer to 65-70 bytes, *per value*. Unless your values > are decent-sized strings, the overhead is going to be many times larger than > the actual data! > -- > Jim C. Nasby, Database Architect j...@nasby.net > 512.569.9461 (cell) http://jim.nasby.net > > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance