Re: [HACKERS] Data archiving/warehousing idea

2007-02-02 Thread Florian G. Pflug

Jochem van Dieten wrote:

On 2/1/07, Chris Dunlop wrote:

In maillist.postgres.dev, you wrote:
Rather than writing in-place, perhaps the SET ARCHIVE would
create a on-disk copy of the table.


Just like CLUSTER does now: create an on-disk copy first and swap the
relfilenodes of the files and flush the relcache.


IIRC, cluster currently needs to take on exclusive lock of the table, thus
preventing any concurrent selects. I assume it would be the same for
"alter table ... set archive".
For a large readonly table - the ones that "set archive" would be used for -
rewriting the whole table might easily
take a few hours, if not days. Blocking reads for such a long time might
be unacceptable in a lot of environments, severely limiting the use-cases
for "alter table ... set archive"

I think that both "cluster" and "truncate" should in theory only need to
prevent concurrent updates, not concurrent selects. AFAIK, the reason they
need to take an exclusive lock is because there is no way to let other backend
see the "old" relfilenode entry in pg_class until the cluster/truncate commits.
So I believe that this limitation would first have to be removed, before a
"alter table ... set archive" would become really usefull...

Just my 0.02 eurocents.
greetings, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: A more general approach (Re: [HACKERS] Data archiving/warehousing idea)

2007-02-02 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-02-01 kell 12:31, kirjutas Tom Lane:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > A more radical variation of the "restricted-use archive table" approach
> > is storing all tuple visibility info in a separate file.
> > At first it seems to just add overhead, but for lots (most ? ) usecases
> > the separately stored visibility should be highly compressible, so for
> > example for bulk-loaded tables you could end up with one bit per page
> > saying that all tuples on this page are visible.
> 
> The more you compress, the slower and more complicated it will be to
> access the information.  I'd put my money on this being a net loss in
> the majority of scenarios.

define "majority" :)

In real life it is often faster to access compressed information,
especially if it is stored in something like trie where compression and
indeked access are the same thing.

the most gain will of course come from bulk-loaded data, where the
"compressed" representation can just say something like "pages 1 to
20 are all visible starting from transaction 5000 and so is first
half of page 21, second half of 21 and pages up to 25 are
visible from trx 6000. 

In this case the visibility info will always stay in L1 cache and thus
be really fast, maybe even free if we account for cache reloads and
such.

But it may be better to still have a bitmap there for sake of simplicity
and have some of it be accessible from L2 cache (20 pages times say
2 bits is still only 100kB bitmap for 1.6GB of data).

Of course there are cases where this approach is worse, sometimes much
woorse, than current one, but the possibility of independently
compressing visibility info and making some types of VACUUM vastly
cheaper may make it a net win in several cases. 

Also, for higly dynamic tables a separate visibility heap might also
speed up bitmap scans, as access to heap happens only for visible
tuples. 

This can also be one way to get rid of need to write full data tuples
thrice - first the original write, then commit bits and then deleted
bits. instead we can just write the bits once for a whole set of tuples.

Usually most of visibility info can be thrown out quite soon, as the
active transaction window advances, so the first level of "compression"
is just thtrowing out cmin/cmax and setting commit bit, then setting the
tuple to just "visible".

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: A more general approach (Re: [HACKERS] Data archiving/warehousing idea)

2007-02-01 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes:
> A more radical variation of the "restricted-use archive table" approach
> is storing all tuple visibility info in a separate file.
> At first it seems to just add overhead, but for lots (most ? ) usecases
> the separately stored visibility should be highly compressible, so for
> example for bulk-loaded tables you could end up with one bit per page
> saying that all tuples on this page are visible.

The more you compress, the slower and more complicated it will be to
access the information.  I'd put my money on this being a net loss in
the majority of scenarios.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: A more general approach (Re: [HACKERS] Data archiving/warehousing idea)

2007-02-01 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-02-01 kell 14:38, kirjutas Hannu Krosing:
> Ühel kenal päeval, N, 2007-02-01 kell 13:24, kirjutas Gavin Sherry:
> 
> > A different approach discussed earlier involves greatly restricting the
> > way in which the table is used. This table could only be written to if an
> > exclusive lock is held; on error or ABORT, the table is truncated.
> > 
> > The problem is that a lot of this looks like a hack and I haven't seen a
> > very clean approach which has gone beyond basic brain dump.
> 
> A more radical variation of the "restricted-use archive table" approach
> is storing all tuple visibility info in a separate file.
> 
> At first it seems to just add overhead, but for lots (most ? ) usecases
> the separately stored visibility should be highly compressible, so for
> example for bulk-loaded tables you could end up with one bit per page
> saying that all tuples on this page are visible.
> 
> Also this could be used to speed up vacuums, as only the visibility
> table needs to be scanned duting phase 1 of vacuum, and so tables with
> localised/moving hotspots can be vacuumed withoutd scanning lots of
> static data.
> 
> Also, storing the whole visibility info, but in a separate heap, lifts
> all restrictions of the "restricted-use archive table" variant. 
> 
> And the compression of visibility info (mostly replacing per-tuple info
> with per-page info) can be carried out by a separate vacuum-like
> process.
> 
> And it has many of the benefits of static/RO tables, like space saving
> and index-only queries. Index-only will of course need to get the
> visibility info from visibility heap, but if it is mostly heavily
> compressed, it will be a lot cheaper than random access to data heap.

For tables with fixed-width tuples it can probably be extended to
support vertical fragmentation as well, to get DWH benefits similar to
http://monetdb.cwi.nl/ .

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


A more general approach (Re: [HACKERS] Data archiving/warehousing idea)

2007-02-01 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-02-01 kell 13:24, kirjutas Gavin Sherry:

> A different approach discussed earlier involves greatly restricting the
> way in which the table is used. This table could only be written to if an
> exclusive lock is held; on error or ABORT, the table is truncated.
> 
> The problem is that a lot of this looks like a hack and I haven't seen a
> very clean approach which has gone beyond basic brain dump.

A more radical variation of the "restricted-use archive table" approach
is storing all tuple visibility info in a separate file.

At first it seems to just add overhead, but for lots (most ? ) usecases
the separately stored visibility should be highly compressible, so for
example for bulk-loaded tables you could end up with one bit per page
saying that all tuples on this page are visible.

Also this could be used to speed up vacuums, as only the visibility
table needs to be scanned duting phase 1 of vacuum, and so tables with
localised/moving hotspots can be vacuumed withoutd scanning lots of
static data.

Also, storing the whole visibility info, but in a separate heap, lifts
all restrictions of the "restricted-use archive table" variant. 

And the compression of visibility info (mostly replacing per-tuple info
with per-page info) can be carried out by a separate vacuum-like
process.

And it has many of the benefits of static/RO tables, like space saving
and index-only queries. Index-only will of course need to get the
visibility info from visibility heap, but if it is mostly heavily
compressed, it will be a lot cheaper than random access to data heap.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Data archiving/warehousing idea

2007-02-01 Thread Simon Riggs
On Thu, 2007-02-01 at 15:03 +1100, Chris Dunlop wrote:

> > A different approach discussed earlier involves greatly
> > restricting the way in which the table is used. This table
> > could only be written to if an exclusive lock is held; on
> > error or ABORT, the table is truncated.
> 
> You're talking about the "no-WAL" concept?  Not quite the same
> thing I think, but perhaps complimentary to the ARCHIVE idea: I
> wouldn't expect an ARCHIVE table to need to generate any WAL
> entries as it would be read only.

Setting an option to reduce the size of the row headers needs to be done
before its loaded, not after. If you mark a table as APPEND-only it
would be possible to save 4 bytes off the row header. 

Saving a further 8-bytes off the row header by dropping the xmin and
commandid fields isn't a very workable solution since those fields
provide visibility within a transaction and across transactions. You'd
end up with a table that could only be loaded by one transaction and
only touched by one command within that transaction. That would require
bookkeeping on the relation itself (in pg_class) and an
AccessExclusiveLock. In those circumstances you would be able to save on
writing WAL as well. You'd be basically saying that it could only be
loaded by a special utility and would be read-only once loaded.

I'm not crazy about those ideas, even though I think I suggested them
some time back. They're very special case and would probably require
many functions to handle multiple cases, so that additional complexity
and cost would effect all users.

My main doubt relates to the data you're storing. If the events you're
interested in have so little information associated with them that they
are a thin table then storing them at all is questionable. I've been
involved at the design stage of a number of data warehouses and the
amount of data eventually stored is typically < 10% of the number people
first thought of, with number of rows decreasing drastically and the
number of columns increasing slightly as people try to gain additional
value from their data storage. So row overhead should be less of a
concern.

It is possible to save on WAL by doing COPY LOCK or using a table with
different resilience characteristics, both ideas are already on the
TODO. There are also better, less intrusive ways of reducing data volume
and improving load performance.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Data archiving/warehousing idea

2007-02-01 Thread Jochem van Dieten

On 2/1/07, Chris Dunlop wrote:

In maillist.postgres.dev, you wrote:

On Thu, 1 Feb 2007, Chris Dunlop wrote:

The main idea is that, there might be space utilisation and
performance advantages if postgres had "hard" read-only
tables, i.e.  tables which were guaranteed (by postgres) to
never have their data changed (insert/update/delete).

This could potentially save disk space by allowing "book
keeping" elements in the page/tuple headers to be removed,
e.g.  visibility information etc.  Also, some indexes could
potentially be packed tighter if we know the data will never
change (of course this is already available using the
fillfactor control).


Well, there is also CPU overhead doing MVCC but there are a
few fundamental problems that must be overcome. The most
significant is that no useful table is always read only,
otherwise you could never load it.


Yes, that's why I was proposing ALTER TABLE... SET ARCHIVE
rather than CREATE TABLE... ARCHIVE.  (Although, for
consistency, perhaps the CREATE TABLE would be allowed, it's
just that you couldn't load anything into it until you did a
ALTER TABLE... DROP ARCHIVE.)


If you want to squeeze the most out of it, CLUSTER would need to be
able to set the archive option too. (Otherwise you first rewrite the
table when you cluster it, and then rewrite it again when you archive
it.)



Rather than writing in-place, perhaps the SET ARCHIVE would
create a on-disk copy of the table.


Just like CLUSTER does now: create an on-disk copy first and swap the
relfilenodes of the files and flush the relcache.



Of course this would demand
you have twice the disk space available which may be prohibitive
in a large warehouse.  On the other hand, I'm not sure if you
would have a single humongous table that you'd SET ARCHIVE on,
you might be as likely to archive on a weekly or yearly or
whatever is manageable basis, along the lines of:

  begin;
  select * into foo_2006 from foo
where date_trunc('year', timestamp) = '2006-01-01';
  delete from foo
where date_trunc('year', timestamp) = '2006-01-01';
  alter table foo_2006 set archive;
  alter table foo_2006 inherit foo;
  commit;


Ideally you let most of it run outside a transaction:

create table foo_2006 ();
insert into foo_2006 SELECT * from foo where ;
cluster foo_2006 on ... ARCHIVE;
begin;
 delete from foo where PK in select PK from foo_2006;
 alter table foo_2006 inherit foo;
commit;



You're talking about the "no-WAL" concept?  Not quite the same
thing I think, but perhaps complimentary to the ARCHIVE idea: I
wouldn't expect an ARCHIVE table to need to generate any WAL
entries as it would be read only.


The WAL gains come automatically when data isn't changed. But there
are additional advantages that can be obtained with archive tables:
- no need to vacuum them, not even for XID rollover (but you do need
to vacuum their entries in the catalogs)
- index entries are always valid so no need to check the heap (unless
the index is lossy)
- you can force the fillfactor to 100% regardless of the setting
- more agressive compression of toast tables

Jochem

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Data archiving/warehousing idea

2007-01-31 Thread Chris Dunlop
G'day Gavin,

In maillist.postgres.dev, you wrote:
> On Thu, 1 Feb 2007, Chris Dunlop wrote:
>> The main idea is that, there might be space utilisation and
>> performance advantages if postgres had "hard" read-only
>> tables, i.e.  tables which were guaranteed (by postgres) to
>> never have their data changed (insert/update/delete).
>>
>> This could potentially save disk space by allowing "book
>> keeping" elements in the page/tuple headers to be removed,
>> e.g.  visibility information etc.  Also, some indexes could
>> potentially be packed tighter if we know the data will never
>> change (of course this is already available using the
>> fillfactor control).
>
> Well, there is also CPU overhead doing MVCC but there are a
> few fundamental problems that must be overcome. The most
> significant is that no useful table is always read only,
> otherwise you could never load it.

Yes, that's why I was proposing ALTER TABLE... SET ARCHIVE
rather than CREATE TABLE... ARCHIVE.  (Although, for
consistency, perhaps the CREATE TABLE would be allowed, it's
just that you couldn't load anything into it until you did a
ALTER TABLE... DROP ARCHIVE.)

> What do we do in the presence of a failure during the load or
> a user issued ABORT? I guess we'd truncate the table...

I was thinking the load simply couldn't happen if the table were
SET ARCHIVE.

> What about replay after a crash?

No replay would be required on that table as it would *NOT* be
changed once an SET ARCHIVE were done (unless a DROP ARCHIVE
were done).

> Another way of looking at it is, we use the 'bookkeeping'
> information in the tuple header for concurrency and for
> handling the abortion of the transaction.

So, unless there's something I'm missing (not completely
unlikely!), as long as the table (including it's on-disk
representation) was never changed, the bookkeeping
information wouldn't be required?

>> The idea would be to introduce a statement something like:
>>
>>   ALTER TABLE foo SET ARCHIVE;
>
> I'd not thought of that approach. There are two problems: some
> archive tables are so large that loading them and then
> reprocessing them isn't appealing.

Yes - it would only work if you were prepared to wear the cost
of the SET ARCHIVE, which could certainly be considerable.

...oh, I think I see what you were getting at above: you were
thinking of loading the data into the already SET ARCHIVE table
to avoid the considerable cost of rewriting the disk format etc.
I hadn't considered that, but yes, if you were to allow that I
suppose in the presence of load errors or ABORTS etc. the table
could simply be truncated.  (For whatever value of "simply" is
appropriate!)

> Secondly, we'd be rewriting the binary structure of the table
> and this does not suit the non-overwriting nature of
> Postgres's storage system.

Rather than writing in-place, perhaps the SET ARCHIVE would
create a on-disk copy of the table.  Of course this would demand
you have twice the disk space available which may be prohibitive
in a large warehouse.  On the other hand, I'm not sure if you
would have a single humongous table that you'd SET ARCHIVE on,
you might be as likely to archive on a weekly or yearly or
whatever is manageable basis, along the lines of:

  begin;
  select * into foo_2006 from foo
where date_trunc('year', timestamp) = '2006-01-01';
  delete from foo
where date_trunc('year', timestamp) = '2006-01-01';
  alter table foo_2006 set archive;
  alter table foo_2006 inherit foo;
  commit;

> A different approach discussed earlier involves greatly
> restricting the way in which the table is used. This table
> could only be written to if an exclusive lock is held; on
> error or ABORT, the table is truncated.

You're talking about the "no-WAL" concept?  Not quite the same
thing I think, but perhaps complimentary to the ARCHIVE idea: I
wouldn't expect an ARCHIVE table to need to generate any WAL
entries as it would be read only.

Cheers,

Chris.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Data archiving/warehousing idea

2007-01-31 Thread Gavin Sherry
On Thu, 1 Feb 2007, Chris Dunlop wrote:

> G'day hackers,

G'Day Chris,

> already - I couldn't find anything in the mail archives, but
> that doesn't mean it's not there...)

There has been a lot of discussion about this kind of thing over the
years.

> The main idea is that, there might be space utilisation and
> performance advantages if postgres had "hard" read-only tables,
> i.e. tables which were guaranteed (by postgres) to never have
> their data changed (insert/update/delete).
>
> This could potentially save disk space by allowing "book
> keeping" elements in the page/tuple headers to be removed, e.g.
> visibility information etc.  Also, some indexes could
> potentially be packed tighter if we know the data will never
> change (of course this is already available using the fillfactor
> control).

Well, there is also CPU overhead doing MVCC but there are a few
fundamental problems that must be overcome. The most significant is that
no useful table is always read only, otherwise you could never load it.
What do we do in the presence of a failure during the load or a user
issued ABORT? I guess we'd truncate the table... What about replay after a
crash?

Another way of looking at it is, we use the 'bookkeeping' information in
the tuple header for concurrency and for handling the abortion of the
transaction.

> The idea would be to introduce a statement something like:
>
>   ALTER TABLE foo SET ARCHIVE;

I'd not thought of that approach. There are two problems: some archive
tables are so large that loading them and then reprocessing them isn't
appealing. Secondly, we'd be rewriting the binary structure of the table
and this does not suit the non-overwriting nature of Postgres's storage
system.

A different approach discussed earlier involves greatly restricting the
way in which the table is used. This table could only be written to if an
exclusive lock is held; on error or ABORT, the table is truncated.

The problem is that a lot of this looks like a hack and I haven't seen a
very clean approach which has gone beyond basic brain dump.

Thanks,

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster