I would strongly suggest that you use a proper relational schema, instead of storing everything in two tables. I don't know your application, but a schema like that is called an Entity-Attribute-Value (though your entity seems to be just posx and posy) and it should raise a big red flag in the mind of any database designer. In particular, constructing queries against an EAV schema is a major pain in the ass. This has been discussed before on postgresql lists as well, you might want to search and read the previous discussions.

Ignoring the EAV issue for a moment, it's hard to give advice without knowing what kind of queries are going to executed. Are the lookups always going to be by id? By posx/posy perhaps? By attribute?

[EMAIL PROTECTED] wrote:
Hi

I have the following scenario for a database that I need to design, and
would like some hints on what to improve or do differently to achieve the
desired performance goal, disregarding hardware and postgres tuning.

The premise is an attribute database that stores about 100 different
attribute types as attribute values. Every X seconds, Y number of new
attribute values are stored in the database. X is constant and currently between 6 and 20 seconds, depending on the setup. In the future X could
become as low as 3 seconds. Y can, within the next 5-10 years, become as
high as 200 000.

That means that for example, every 6 seconds 100 000 attributes needs to
be written to the database.

At the same time, somewhere between 5-20 users needs to read parts of
those newly written attributes, maybe in total 30 000 attributes.

This continues for the duration of the field operation, which could be
18hrs a day for 6 weeks. So the total db size is up towards 200 gigs.

Now here is how I suggest doing this:

1- the tables

table attribute_values:
        id              int
        attr_type       int  ( references attribute_types(id) )
        posX            int
        posY            int
        data_type       int
        value           varchar(50)

table attribute_types:
        id              int
        name            varchar(200);



2- function

   a function that receives an array of data and inserts each attribute.
   perhaps one array per attribute data (type, posX, posY, data_type,
   value) so five arrays as in parameters ot the function

3- java client

   the client receives the data from a corba request, and splits it
   into, say 4 equally sized blocks and executes 4 threads that insert
   each block (this seems to be more efficient than just using one
   thread.)

Now I am wondering if this is the most efficient way of doing it?

- I know that I could group the attributes so that each type of attribute
gets its own table with all attributes in one row. But I am not sure if
that is any more efficient than ont attribute per row since I pass
everything to the function as an array.
With the above design a change in attribute types only requires changing
the data in a table instead of having to modify the client, the function
and the tables.

- I am also wondering if writing the client and function in C would create
a more efficient solution.

any comments?

ps, I am currently running postgres 8.1, but could probably use 8.2 if it
is needed for functionality or performance reasons. It will run on a sparc
machine with solaris 10 and perhaps 4-6 processors, as many GB of RAM as
necessary and SCSI disks ( perhaps in raid 0 ).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Reply via email to