Hi Jim

You actually made me think about the schema Michel and I are using:

> And KVP is? ;)

CREATE TABLE mykvpstore( id bigint PRIMARY KEY )
CREATE TABLE kvp ( id bigint REFERENCES mykvpstore(id), key text NOT
NULL, value text, );
-- with index on key

And the table with the associative array type (hstore) is:
CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps hstore NOT NULL );
-- with GIST index on obj

It seems to me that in the mykvpstore-kvp there is also some overhead.

And yes, we have no clue what keys to anticipate, except for some
common ones like 'name': The use case is coming from OpenStreetMap
(http://wiki.openstreetmap.org/wiki/Database_schema ).

Yours, Stefan


2011/5/17 Jim Nasby <j...@nasby.net>:
> On May 16, 2011, at 8:47 AM, Merlin Moncure wrote:
>> On Sat, May 14, 2011 at 5:10 AM, Stefan Keller <sfkel...@gmail.com> wrote:
>>> Hi,
>>>
>>> I am conducting a benchmark to compare KVP table vs. hstore and got
>>> bad hstore performance results when the no. of records is greater than
>>> about 500'000.
>>>
>>> CREATE TABLE kvp ( id SERIAL PRIMARY KEY, key text NOT NULL, value text );
>>> -- with index on key
>>> CREATE TABLE myhstore ( id SERIAL PRIMARY KEY, obj hstore NOT NULL );
>>> -- with GIST index on obj
>>>
>>> Does anyone have experience with that?
>>
>> hstore is not really designed for large-ish sets like that.
>
> And KVP is? ;)
>
> IIRC hstore ends up just storing everything as text, with pointers to know 
> where things start and end. There's no real indexing inside hstore, so 
> basically the only thing it can do is scan the entire hstore.
>
> That said, I would strongly reconsider using KVP for anything except the most 
> trivial of data sets. It is *extremely* inefficient. Do you really have 
> absolutely no idea what *any* of your keys will be? Even if you need to 
> support a certain amount of non-deterministic stuff, I would put everything 
> you possibly can into real fields and only use KVP or hstore for things that 
> you really didn't anticipate.
>
> Keep in mind that for every *value*, your overhead is 24 bytes for the heap 
> header, 2+ varlena bytes in the heap, plus the length of the key. In the 
> index you're looking at 6+ bytes of overhead, 1+ byte for varlena, plus the 
> length of the key. The PK will cost you an additional 16-24 bytes, depending 
> on alignment. So that's a *minimum* of ~50 bytes per value, and realistically 
> the overhead will be closer to 65-70 bytes, *per value*. Unless your values 
> are decent-sized strings, the overhead is going to be many times larger than 
> the actual data!
> --
> Jim C. Nasby, Database Architect                   j...@nasby.net
> 512.569.9461 (cell)                         http://jim.nasby.net
>
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to