(2013/09/04 15:23), Atri Sharma wrote:


Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu <sn...@uptime.jp> wrote:

Hi,

(2013/09/04 12:52), Atri Sharma wrote:
On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <sn...@uptime.jp> wrote:
Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,


If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri





Right,thanks.

How would using heap and BTree help here? Are we looking at a priority queue 
which supports the main storage system of the stats?

For example, when you read only a single block from your table,
then you need to write all values in your database statistics next.
It often generates large amount of i/o operation.

However, if random access is allowed in the statistics, you can
update only as single record for the specific table which you read.
It would be less than 100 bytes for each table.

I have no idea about how a priority queue can work here so far.
However, if the statistics is overhauled, PostgreSQL would be able
to host a much larger number of customers more easily.

Regards,
--
Satoshi Nagayasu <sn...@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to