Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2008-04-10 Thread Bruce Momjian

I have added URLs to your patch to the TODO list:

* Allow data to be pulled directly from indexes

---

Gokulakannan Somasundaram wrote:
 Hi,
 I would like to present the first patch. It currently has the following
 restrictions
 a) It does not support any functional indexes.
 b) It supports queries like select count(1) from table where (restrictions
 from indexed columns), but it does not support select count(1) from table.
 
 The Syntax to create this type of index is
 
 create thick index idx on dd(n1,n2)
 
 here idx- index name and dd- table name and n1 and n2 are column names.
 
 I have created a extra column in pg_index called indhassnapshot.
 
 I have also enabled the display of Logical Reads. In order to see that, set
 log_statement_stats on.
 
 The thick index is clearly on the front, if you issue queries like
 
 select n2 from dd where n11000 and n21500;
 
 As already said, if the update is not incurring any extra cost, except if
 the indexed columns are updated. Deletes are costly, making it ideal for
 partitioned tables.
 
 In order to update the thick indexes, i have accessed the ps_ExprContext in
 PlanState to get the oldtuple. But if we have a outer plan and inner plan,
 then i have set the ps_ExprContext of  innerplan to the outerplan. I don't
 know whether there will be instances where the ps_ExprContext of outerplan
 node will have some use in update queries.
 
 Right now, it passes the regression test suite. I had slight trouble with
 pg_indent, so i think it has not got applied properly. But i have tried to
 remove all the whitespace differences. Please be kind to me in case i have
 missed any whitespace differences. :)
 
 Please review the patch and provide your comments.
 
 Thanks,
 Gokul.
 CertoSQL Project,
 Allied Solution Groups.
 (www.alliedgroups.com)
 
 On 10/23/07, Hannu Krosing [EMAIL PROTECTED] wrote:
 
  ?hel kenal p?eval, L, 2007-10-20 kell 10:19, kirjutas Luke Lonergan:
   Hi Hannu,
  
   On 10/14/07 12:58 AM, Hannu Krosing [EMAIL PROTECTED] wrote:
  
What has happened in reality, is that the speed difference between
  CPU,
RAM and disk speeds has _increased_ tremendously
  
   Yes.
  
which makes it even
more important to _decrease_ the size of stored data if you want good
performance
  
   Or bring the cpu processing closer to the data it's using (or both).
  
   By default, the trend you mention first will continue in an unending way
  -
   the consequence is that the distance between a processor and it's
  target
   data will continue to increase ad-infinitum.
 
  the emergence of solid-state (flash) disks may help a little here, but
  in general it is true.
 
   By contrast, you can only decrease the data volume so much - so in the
  end
   you'll be left with the same problem - the data needs to be closer to
  the
   processing.  This is the essence of parallel / shared nothing
  architecture.
  
   Note that we've done this at Greenplum.  We're also implementing a
  DSM-like
   capability and are investigating a couple of different hybrid row /
  column
   store approaches.
 
  Have you tried moving the whole visibility part of tuples out to a
  separate heap ?
 
  Especially in OLAP/ETL scenarios the distribution of tuples loaded in
  one transaction should be very good for visibility-info compression.
 
  I'd suspect that you could crush hundreds of pages worth of visibility
  into single RLE encoding unit (xmin=N, xmax=no_yet, start_ctid = X,
  end_ctid=Y), and it will stay in L1 cache most of the time you process
  the corresponding relation. and the relation itself will be smaller, and
  index-only (actually index-only + lookup inside L1 cache) access can
  happen, and so on .
 
  OTOH, if you load it in millions of small transactions, you can run
  VACUUM FREEZE _on_ the visibility heap only, which will make all
  visibility infoe look similar and thus RLE-compressable and again make
  it fit in L1 cache, if you dont have lots of failed loads interleaved
  with successful ones.
 
   Bitmap index with index-only access does provide nearly all of the
   advantages of a column store from a speed standpoint BTW.  Even though
   Vertica is touting speed advantages - our parallel engine plus bitmap
  index
   will crush them in benchmarks when they show up with real code.
  
   Meanwhile they're moving on to new ideas - I kid you not Horizontica
  is
   Dr. Stonebraker's new idea :-)
 
  Sounds like a result of a marketroid brainstorming session :P
 
   So - bottom line - some ideas from column store make sense, but it's not
  a
   cure-all.
  
There is also a MonetDB/X100 project, which tries to make MonetOD
order(s) of magnitude faster by doing in-page compression in order to
get even more performance, see:
  
   Actually, the majority of the points made by the MonetDB team involve
   decreasing the abstractions in the processing path to improve 

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2008-01-28 Thread Jonah H. Harris
On Jan 28, 2008 8:21 AM, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote:
 I am not seeing my mail getting listed in the archives. So i am just
 resending it, in case the above one has got missed.

It was sent.  Archive processing is delayed.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-28 Thread Gokulakannan Somasundaram
On 10/28/07, Hannu Krosing [EMAIL PROTECTED] wrote:

 Ühel kenal päeval, R, 2007-10-26 kell 16:46, kirjutas Gokulakannan
 Somasundaram:
  What does the numbers look like if the the tables are small
  enough to
  fit in RAM?
 
  I don't know whether this is a valid production setup, against which
  we need to benchmark.

 Often the production setup may have at least most of indexes in RAM, if
 not the whole data.


My test happened to be something where the base folder size was less than
RAM. So this test case has been covered, although without my intention.

Thanks,
Gokul.

--
 Hannu






-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-27 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-10-26 kell 16:46, kirjutas Gokulakannan
Somasundaram:
 What does the numbers look like if the the tables are small
 enough to
 fit in RAM?
 
 I don't know whether this is a valid production setup, against which
 we need to benchmark. 

Often the production setup may have at least most of indexes in RAM, if
not the whole data.

--
Hannu




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote:
 As far as Load Test is concerned, i have tried to provide all the relevant
 details. Please inform me, if i have left any.

Thanks!

How large were the tables?

Did you run all the queries concurrently? At this point, I think it'd be
better to run them separately so that you can look at the impact on each
kind of operation in isolation.

What kind of an I/O system does the server have?

It'd be interesting to get the cache hit/miss ratios, as well as the
output of iostat (or similar) during the test. How much of the benefit
is due to reduced random I/O?

What does the numbers look like if the the tables are small enough to
fit in RAM?

You should do some tuning, the PostgreSQL default configuration is not
tuned for maximum performance. At least increase checkpoint_segments and
checkpoint_timeout and shared_buffers. Though I noticed that you're
running on Windows; I don't think anyone's done any serious performance
testing or tuning on Windows yet, so I'm not sure how you should tune that.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Gokulakannan Somasundaram
On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  As far as Load Test is concerned, i have tried to provide all the
 relevant
  details. Please inform me, if i have left any.

 Thanks!

 How large were the tables?


It is in the Performance test report.  They contain 2 million records.  6
columns wide, 3 text and 3 numeric. same set of tables used for both tests,
after refresh from a file

Did you run all the queries concurrently? At this point, I think it'd be
 better to run them separately so that you can look at the impact on each
 kind of operation in isolation.

Performance tests are run against a workload and i have taken the workload
of a small scale partitioning setup. Running the queries individually has
already been done and the count of logical reads have been verified. I have
already suggested that. For some reason, i am not able to convince that for
simple index scans, Logical reads are a good measure of performance.


What kind of an I/O system does the server have?


Its a normal desktop system. The model no. is ST3400633A, 7200 RPM


It'd be interesting to get the cache hit/miss ratios, as well as the
 output of iostat (or similar) during the test. How much of the benefit
 is due to reduced random I/O?

Good suggestion. i have run the test against Windows. Let me try perfmon in
the next performance test, to monitor the performance test.


What does the numbers look like if the the tables are small enough to
 fit in RAM?


I don't know whether this is a valid production setup, against which we need
to benchmark. But if you insist, i will do that and get back to you next
time.


You should do some tuning, the PostgreSQL default configuration is not
 tuned for maximum performance. At least increase checkpoint_segments and
 checkpoint_timeout and shared_buffers. Though I noticed that you're
 running on Windows; I don't think anyone's done any serious performance
 testing or tuning on Windows yet, so I'm not sure how you should tune
 that.


What we are trying to do here, is to try and compare the performance of two
indexing structures. AFAIK, the performance test done to compare two
software implementations should not have parameter settings, favorable to
one. I have not done any settings change favorable to thick index. But i
have a limited setup, from which i am trying to contribute. So please don't
ask me to run the tests against large scale servers.

 I think a better idea would be to form a Performance testing Workload mix (
Taking into account the QoS Parameters used in the normal database, purging
frequency, typical workload models used in the industry), with freedom in
hardware/software can be drawn. That might solve some of the Load test
riddles.



-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote:
 On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 Gokulakannan Somasundaram wrote:
 As far as Load Test is concerned, i have tried to provide all the
 relevant
 details. Please inform me, if i have left any.
 Thanks!

 How large were the tables?
 
 It is in the Performance test report.  They contain 2 million records.  6
 columns wide, 3 text and 3 numeric. same set of tables used for both tests,
 after refresh from a file

I meant in megabytes. How wide is the data in the text and numeric fields?

 Did you run all the queries concurrently? At this point, I think it'd be
 better to run them separately so that you can look at the impact on each
 kind of operation in isolation.

 Performance tests are run against a workload and i have taken the workload
 of a small scale partitioning setup. Running the queries individually has
 already been done and the count of logical reads have been verified. I have
 already suggested that. For some reason, i am not able to convince that for
 simple index scans, Logical reads are a good measure of performance.

I wouldn't expect any performance gain for simple, not index-only,
scans. They have to hit the heap anyway.

 What does the numbers look like if the the tables are small enough to
 fit in RAM?
 
 I don't know whether this is a valid production setup, against which we need
 to benchmark. But if you insist, i will do that and get back to you next
 time.

A lot of people run databases that fit in RAM. And a lot of people
don't. Both cases are interesting. I'm particularly curious about that
because you've argued that the number of logical reads is important,
even if they don't become physical reads. Hannu also suggested that
swapping pages in/out of shared_buffers is relatively expensive; if
that's the case, we should see index-only scans performing much better
regular index scans, even when there's no physical I/O.

 You should do some tuning, the PostgreSQL default configuration is not
 tuned for maximum performance. At least increase checkpoint_segments and
 checkpoint_timeout and shared_buffers. Though I noticed that you're
 running on Windows; I don't think anyone's done any serious performance
 testing or tuning on Windows yet, so I'm not sure how you should tune
 that.
 
 What we are trying to do here, is to try and compare the performance of two
 indexing structures. AFAIK, the performance test done to compare two
 software implementations should not have parameter settings, favorable to
 one. I have not done any settings change favorable to thick index. 

The tuning I suggested is just basic tuning any knowledgeable Postgres
DBA will do. It's not particularly in favor of any indexing scheme. With
the default checkpoint settings, for example, the system is going to be
busy doing checkpoints all the time if you have a reasonable rate of
updates.

 But i
 have a limited setup, from which i am trying to contribute. So please don't
 ask me to run the tests against large scale servers.

Understood.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Gokulakannan Somasundaram
On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
  Gokulakannan Somasundaram wrote:
  As far as Load Test is concerned, i have tried to provide all the
  relevant
  details. Please inform me, if i have left any.
  Thanks!
 
  How large were the tables?
 
  It is in the Performance test report.  They contain 2 million
 records.  6
  columns wide, 3 text and 3 numeric. same set of tables used for both
 tests,
  after refresh from a file

 I meant in megabytes. How wide is the data in the text and numeric fields?



I have observed the size of PGDATA\base folder for size details
Size of Tables : 367 MB
Size of Tables + thin indexes :  616 MB
Size of Tables + thick indexes : 720 MB

The numbers were simply  running between 1 and 2 million in a serial
fashion. I think i made a mistake here. this would have helped thin indexes
in range scans, since the data is clustered at the table, the bitmap heap
scan would have been more effective. So i hope thick indexes will be more
effective, if  uncluster the data, since the thin index has to goto more
table buffers. The test columns are approx 10 characters in length.



 Did you run all the queries concurrently? At this point, I think it'd be
  better to run them separately so that you can look at the impact on
 each
  kind of operation in isolation.
 
  Performance tests are run against a workload and i have taken the
 workload
  of a small scale partitioning setup. Running the queries individually
 has
  already been done and the count of logical reads have been verified. I
 have
  already suggested that. For some reason, i am not able to convince that
 for
  simple index scans, Logical reads are a good measure of performance.

 I wouldn't expect any performance gain for simple, not index-only,
 scans. They have to hit the heap anyway.


I just feel the above test didn't do much I/Os and yet the index only scans
are faster with thick indexes. since the size of RAM is 1GB and the size of
the data is only 616MB, i hope most of them might have been OS cached. May
be i am missing something here.


 What does the numbers look like if the the tables are small enough to
  fit in RAM?
 
  I don't know whether this is a valid production setup, against which we
 need
  to benchmark. But if you insist, i will do that and get back to you next
  time.

 A lot of people run databases that fit in RAM. And a lot of people
 don't. Both cases are interesting. I'm particularly curious about that
 because you've argued that the number of logical reads is important,
 even if they don't become physical reads. Hannu also suggested that
 swapping pages in/out of shared_buffers is relatively expensive; if
 that's the case, we should see index-only scans performing much better
 regular index scans, even when there's no physical I/O.


So the above test has fit into the RAM. Now do we need a test with tables
that won't fit into RAM. i feel if the thick indexes were effective with
data that would fit into RAM, then it will definitely be more effective with
data that wouldn't fit into RAM. There is one performance bug, with updates
where the caching strategy for BTStack didn't go effective for the Varlena
structures. i will fix that bug next time. Also calls to HOT related stuff
can be avoided, if it happens to be a thick index, I think these two
changes, if made would further improve the performance of thick indexes.


 You should do some tuning, the PostgreSQL default configuration is not
  tuned for maximum performance. At least increase checkpoint_segments
 and
  checkpoint_timeout and shared_buffers. Though I noticed that you're
  running on Windows; I don't think anyone's done any serious performance
  testing or tuning on Windows yet, so I'm not sure how you should tune
  that.
 
  What we are trying to do here, is to try and compare the performance of
 two
  indexing structures. AFAIK, the performance test done to compare two
  software implementations should not have parameter settings, favorable
 to
  one. I have not done any settings change favorable to thick index.

 The tuning I suggested is just basic tuning any knowledgeable Postgres
 DBA will do. It's not particularly in favor of any indexing scheme. With
 the default checkpoint settings, for example, the system is going to be
 busy doing checkpoints all the time if you have a reasonable rate of
 updates.


The inserts and updates were at the rate of 10 every 2 seconds (there in the
performance report) and the update was affecting two rows. I i haven't got
any warning to increase the checkpoint during the test.
But my doubt is  if checkpoint has caused so much of overhead, as we think
of, how can the performance of thick indexes exceed thin indexes in index
only scans?
As you might have observed all the statistics (Even the 90 and 95th
percentile/median) were in milliseconds. So that might give a hint about the
stress on the 

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-24 Thread Bruce Momjian

This has been saved for consideration for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Gokulakannan Somasundaram wrote:
 Hi,
 I would like to present the first patch. It currently has the following
 restrictions
 a) It does not support any functional indexes.
 b) It supports queries like select count(1) from table where (restrictions
 from indexed columns), but it does not support select count(1) from table.
 
 The Syntax to create this type of index is
 
 create thick index idx on dd(n1,n2)
 
 here idx- index name and dd- table name and n1 and n2 are column names.
 
 I have created a extra column in pg_index called indhassnapshot.
 
 I have also enabled the display of Logical Reads. In order to see that, set
 log_statement_stats on.
 
 The thick index is clearly on the front, if you issue queries like
 
 select n2 from dd where n11000 and n21500;
 
 As already said, if the update is not incurring any extra cost, except if
 the indexed columns are updated. Deletes are costly, making it ideal for
 partitioned tables.
 
 In order to update the thick indexes, i have accessed the ps_ExprContext in
 PlanState to get the oldtuple. But if we have a outer plan and inner plan,
 then i have set the ps_ExprContext of  innerplan to the outerplan. I don't
 know whether there will be instances where the ps_ExprContext of outerplan
 node will have some use in update queries.
 
 Right now, it passes the regression test suite. I had slight trouble with
 pg_indent, so i think it has not got applied properly. But i have tried to
 remove all the whitespace differences. Please be kind to me in case i have
 missed any whitespace differences. :)
 
 Please review the patch and provide your comments.
 
 Thanks,
 Gokul.
 CertoSQL Project,
 Allied Solution Groups.
 (www.alliedgroups.com)
 
 On 10/23/07, Hannu Krosing [EMAIL PROTECTED] wrote:
 
  ?hel kenal p?eval, L, 2007-10-20 kell 10:19, kirjutas Luke Lonergan:
   Hi Hannu,
  
   On 10/14/07 12:58 AM, Hannu Krosing [EMAIL PROTECTED] wrote:
  
What has happened in reality, is that the speed difference between
  CPU,
RAM and disk speeds has _increased_ tremendously
  
   Yes.
  
which makes it even
more important to _decrease_ the size of stored data if you want good
performance
  
   Or bring the cpu processing closer to the data it's using (or both).
  
   By default, the trend you mention first will continue in an unending way
  -
   the consequence is that the distance between a processor and it's
  target
   data will continue to increase ad-infinitum.
 
  the emergence of solid-state (flash) disks may help a little here, but
  in general it is true.
 
   By contrast, you can only decrease the data volume so much - so in the
  end
   you'll be left with the same problem - the data needs to be closer to
  the
   processing.  This is the essence of parallel / shared nothing
  architecture.
  
   Note that we've done this at Greenplum.  We're also implementing a
  DSM-like
   capability and are investigating a couple of different hybrid row /
  column
   store approaches.
 
  Have you tried moving the whole visibility part of tuples out to a
  separate heap ?
 
  Especially in OLAP/ETL scenarios the distribution of tuples loaded in
  one transaction should be very good for visibility-info compression.
 
  I'd suspect that you could crush hundreds of pages worth of visibility
  into single RLE encoding unit (xmin=N, xmax=no_yet, start_ctid = X,
  end_ctid=Y), and it will stay in L1 cache most of the time you process
  the corresponding relation. and the relation itself will be smaller, and
  index-only (actually index-only + lookup inside L1 cache) access can
  happen, and so on .
 
  OTOH, if you load it in millions of small transactions, you can run
  VACUUM FREEZE _on_ the visibility heap only, which will make all
  visibility infoe look similar and thus RLE-compressable and again make
  it fit in L1 cache, if you dont have lots of failed loads interleaved
  with successful ones.
 
   Bitmap index with index-only access does provide nearly all of the
   advantages of a column store from a speed standpoint BTW.  Even though
   Vertica is touting speed advantages - our parallel engine plus bitmap
  index
   will crush them in benchmarks when they show up with real code.
  
   Meanwhile they're moving on to new ideas - I kid you not Horizontica
  is
   Dr. Stonebraker's new idea :-)
 
  Sounds like a result of a marketroid brainstorming session :P
 
   So - bottom line - some ideas from column store make sense, but it's not
  a
   cure-all.
  
There is also a MonetDB/X100 project, which tries to make MonetOD
order(s) of magnitude faster by doing in-page compression in order to
get even more performance, see:
  
   Actually, the majority of the points made by the MonetDB team involve
   decreasing the abstractions in the processing 

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote:
 I would like to present the first patch. It currently has the following
 restrictions
 a) It does not support any functional indexes.
 b) It supports queries like select count(1) from table where (restrictions
 from indexed columns), but it does not support select count(1) from table.

An interesting question is how to represent tuples coming from the index
in the executor. I see that you didn't address that at all, because you
only support COUNT(1), and not things like SELECT column FROM table
WHERE id = ? where you actually return datums from the index. But
that's something that we have to think about in the DSM approach as well.

One solution is to form a heap tuple, using the datums from the index,
with the attributes that are not used in the query replaced with NULLs.
That seems simple, but I don't think it'll work with expression indexes,
when you do something like SELECT length(column) FROM table WHERE id =
?, and there's an index on (id, length(column)).

 I have also enabled the display of Logical Reads. In order to see that, set
 log_statement_stats on.

You should start benchmarking, to verify that you're really getting the
kind of speed up you're looking for, before you spend any more effort on
that. Reduction in logical reads alone isn't enough. Remember that for a
big change like that, the gain has to be big as well.

As a first test, I'd like to see results from SELECTs on different sized
tables. On tables that fit in cache, and on tables that don't. Tables
large enough that the index doesn't fit in cache. And as a special case,
on a table just the right size that a normal index fits in cache, but a
thick one doesn't.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  I would like to present the first patch. It currently has the
 following
  restrictions
  a) It does not support any functional indexes.
  b) It supports queries like select count(1) from table where
 (restrictions
  from indexed columns), but it does not support select count(1) from
 table.

 An interesting question is how to represent tuples coming from the index
 in the executor. I see that you didn't address that at all, because you
 only support COUNT(1), and not things like SELECT column FROM table
 WHERE id = ? where you actually return datums from the index. But
 that's something that we have to think about in the DSM approach as well.

That's addressed as well.

One solution is to form a heap tuple, using the datums from the index,
 with the attributes that are not used in the query replaced with NULLs.
 That seems simple, but I don't think it'll work with expression indexes,
 when you do something like SELECT length(column) FROM table WHERE id =
 ?, and there's an index on (id, length(column)).

  I have also enabled the display of Logical Reads. In order to see that,
 set
  log_statement_stats on.

 You should start benchmarking, to verify that you're really getting the
 kind of speed up you're looking for, before you spend any more effort on
 that. Reduction in logical reads alone isn't enough. Remember that for a
 big change like that, the gain has to be big as well.

 As a first test, I'd like to see results from SELECTs on different sized
 tables. On tables that fit in cache, and on tables that don't. Tables
 large enough that the index doesn't fit in cache. And as a special case,
 on a table just the right size that a normal index fits in cache, but a
 thick one doesn't.

 --
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com



Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Heikki Linnakangas
Please keep the list cc'd.

Gokulakannan Somasundaram wrote:
 On 10/23/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 Gokulakannan Somasundaram wrote:
 I have also enabled the display of Logical Reads. In order to see that,
 set
 log_statement_stats on.
 You should start benchmarking, to verify that you're really getting the
 kind of speed up you're looking for, before you spend any more effort on
 that. Reduction in logical reads alone isn't enough. Remember that for a
 big change like that, the gain has to be big as well.
 
 I have done the benchmark. I have done the benchmark with Logical reads, as
 they turn out to be potential physical reads. Try turning on the
 log_statement_stats in postgresql.conf. try firing some queries, which can
 satisfied by the index. You would see the difference.

I would see a decrease in the number of logical reads, that's all. You
need to demonstrate a real increase in throughput and/or reduction in
response times.

Note that even though you reduce the number of logical reads, with a
thick index a logical read is *more* likely to be a physical read,
because the index is larger and therefore consumes more cache.

 As a first test, I'd like to see results from SELECTs on different sized
 tables. On tables that fit in cache, and on tables that don't. Tables
 large enough that the index doesn't fit in cache. And as a special case,
 on a table just the right size that a normal index fits in cache, but a
 thick one doesn't.
 
 I have not done a Load test. That's a good idea. Are you guys using Apache
 JMeter?

You can use whatever you want, as long as you can get the relevant
numbers out of it. contrib/pgbench is a good place to start.

DBT-2 is another test people often use for patches like this. It's quite
tedious to set up and operate, but it'll give you nice very graphs.

Make sure you control vacuums, checkpoints etc., so that you get
repeatable results.

 Also i think you might have noted that the thick indexes are not affected by
 updates, if the updated column is not in the index. I think that add on to
 one more advantage of thick indexes against DSM.

That cannot possibly work. Imagine that you have a table

ctid | id | data
-++-
(0,1)| 1  | foo
(0,2)| 1  | bar

where (0,2) is an updated version of (0,1). If you don't update the
index, there will be no index pointer to (0,2), so a regular index scan,
not an index-only scan, will not find the updated tuple.

Or did you mean that the index is not updated on HOT updates? That's an
interesting observation. We could do index-only scans with the DSM as
well, even if there's HOT updates, if we define the bit in the bitmap to
mean all tuples in this page are visible to everyone, or there's only
HOT updates. That works, because an index-only-scan doesn't access any
of the updated columns. It probably isn't worth it, though. Seems like a
pretty narrow use case, and makes it more complicated.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Please keep the list cc'd.

 Gokulakannan Somasundaram wrote:
  On 10/23/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
  Gokulakannan Somasundaram wrote:
  I have also enabled the display of Logical Reads. In order to see that,
  set
  log_statement_stats on.
  You should start benchmarking, to verify that you're really getting the
  kind of speed up you're looking for, before you spend any more effort
 on
  that. Reduction in logical reads alone isn't enough. Remember that for
 a
  big change like that, the gain has to be big as well.
 
  I have done the benchmark. I have done the benchmark with Logical reads,
 as
  they turn out to be potential physical reads. Try turning on the
  log_statement_stats in postgresql.conf. try firing some queries, which
 can
  satisfied by the index. You would see the difference.

 I would see a decrease in the number of logical reads, that's all. You
 need to demonstrate a real increase in throughput and/or reduction in
 response times.

 Note that even though you reduce the number of logical reads, with a
 thick index a logical read is *more* likely to be a physical read,
 because the index is larger and therefore consumes more cache.


Say, with a normal index, you need to goto the table for checking the
snapshot. So you would be loading both the index pages + table pages, in
order to satisfy a certain operations. Whereas in thick index you occupy 16
bytes per tuple more in order to avoid going to the table. So memory
management is again better. But i can run the load test, if that's
required.  Even when all the tuples are in memory, index only scans are
almost 40-60% faster than the index scans with thin indexes.

 As a first test, I'd like to see results from SELECTs on different sized
  tables. On tables that fit in cache, and on tables that don't. Tables
  large enough that the index doesn't fit in cache. And as a special
 case,
  on a table just the right size that a normal index fits in cache, but a
  thick one doesn't.
 
  I have not done a Load test. That's a good idea. Are you guys using
 Apache
  JMeter?

 You can use whatever you want, as long as you can get the relevant
 numbers out of it. contrib/pgbench is a good place to start.

 DBT-2 is another test people often use for patches like this. It's quite
 tedious to set up and operate, but it'll give you nice very graphs.

 Make sure you control vacuums, checkpoints etc., so that you get
 repeatable results.



Sure i will do that. Thanks for the advice.

 Also i think you might have noted that the thick indexes are not affected
 by
  updates, if the updated column is not in the index. I think that add on
 to
  one more advantage of thick indexes against DSM.

 That cannot possibly work. Imagine that you have a table

 ctid | id | data
 -++-
 (0,1)| 1  | foo
 (0,2)| 1  | bar

 where (0,2) is an updated version of (0,1). If you don't update the
 index, there will be no index pointer to (0,2), so a regular index scan,
 not an index-only scan, will not find the updated tuple.

 Or did you mean that the index is not updated on HOT updates? That's an
 interesting observation. We could do index-only scans with the DSM as
 well, even if there's HOT updates, if we define the bit in the bitmap to
 mean all tuples in this page are visible to everyone, or there's only
 HOT updates. That works, because an index-only-scan doesn't access any
 of the updated columns. It probably isn't worth it, though. Seems like a
 pretty narrow use case, and makes it more complicated.



I think i was not understood. An update transaction is not degraded by thick
index. Update = Delete + insert. If you don't update the columns in index,
then we would goto the same index page for both delete and insert. i have
done a small optimization there to cache the BTStack. you do not need to do
any more I/O. So effectively update performance in thick index = update
performance in thin index (if indexed columns are not updated).
Hope i am clear..

What do you thick about not maintaining pins in case of thick indexes?

Thanks,
Gokul,
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote:
 Say, with a normal index, you need to goto the table for checking the
 snapshot. So you would be loading both the index pages + table pages, in
 order to satisfy a certain operations. Whereas in thick index you occupy 16
 bytes per tuple more in order to avoid going to the table. So memory
 management is again better. But i can run the load test, if that's
 required.

Yes, performance testing is required for any performance-related patch.

Remember that you're competing against DSM. We're going to want some
kind of a DSM anyway because it allows skipping unmodified parts of the
heap in vacuum.

  Even when all the tuples are in memory, index only scans are
 almost 40-60% faster than the index scans with thin indexes.

Have you actually benchmarked that? What kind of query was that? I don't
believe for a second that fetching the heap tuple when the page is in
memory accounts for 40-60% of the overhead of regular index scans.

 What do you thick about not maintaining pins in case of thick indexes?

Seems irrelevant. Keeping a page pinned is cheap.

BTW, another issue you'll have to tackle, that a DSM-based patch will
have to solve as well, is how to return tuples from an index. In b-tree,
we scan pages page at a time, keeping a list of all tids that match the
scanquals in BTScanOpaque. If we need to return not only the tids of the
matching tuples, but the tuples as well, where do we store them? You
could make a palloc'd copy of them all, but that seems quite expensive.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  Say, with a normal index, you need to goto the table for checking the
  snapshot. So you would be loading both the index pages + table pages, in
  order to satisfy a certain operations. Whereas in thick index you occupy
 16
  bytes per tuple more in order to avoid going to the table. So memory
  management is again better. But i can run the load test, if that's
  required.

 Yes, performance testing is required for any performance-related patch.

 Remember that you're competing against DSM. We're going to want some
 kind of a DSM anyway because it allows skipping unmodified parts of the
 heap in vacuum.

   Even when all the tuples are in memory, index only scans are
  almost 40-60% faster than the index scans with thin indexes.

 Have you actually benchmarked that? What kind of query was that? I don't
 believe for a second that fetching the heap tuple when the page is in
 memory accounts for 40-60% of the overhead of regular index scans.


The patch has been submitted.  Try Explain Analyze. You can see it for
yourself.  Try creating a table and normal index. try creating another table
with thick index. Check for queries which involves index-only scans. it
won't get displayed in the plan. If you create a index on (n1,n2) and insert
some 100,000 rows try querying like select n2 from table where n1   and n1
.  Play around with it to see the difference.

 What do you thick about not maintaining pins in case of thick indexes?

 Seems irrelevant. Keeping a page pinned is cheap.


I am not referring to the process of pinning a page. It is the occupation of
8KB of memory. you don't need to occupy it in case of thick indexes, once
the page is referred.


BTW, another issue you'll have to tackle, that a DSM-based patch will
 have to solve as well, is how to return tuples from an index. In b-tree,
 we scan pages page at a time, keeping a list of all tids that match the
 scanquals in BTScanOpaque. If we need to return not only the tids of the
 matching tuples, but the tuples as well, where do we store them? You
 could make a palloc'd copy of them all, but that seems quite expensive.


I have done a palloc for MinimalIndexTuple(with just the datums). palloc is
costly, but it is not as costly as referring to a table. In other words it
is not as costly as an I/O. Memory operates in micro seconds, I/O operates
in milli seconds.  I think the performance test results would answer  these
concerns.

Are you convinced with the update performance? Definitely that's not there
with DSM:)

Thanks,
Gokul.
CertoSQL Project.
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-10-23 kell 13:04, kirjutas Heikki
Linnakangas:
 Gokulakannan Somasundaram wrote:
  Say, with a normal index, you need to goto the table for checking the
  snapshot. So you would be loading both the index pages + table pages, in
  order to satisfy a certain operations. Whereas in thick index you occupy 16
  bytes per tuple more in order to avoid going to the table. So memory
  management is again better. But i can run the load test, if that's
  required.
 
 Yes, performance testing is required for any performance-related patch.
 
 Remember that you're competing against DSM. We're going to want some
 kind of a DSM anyway because it allows skipping unmodified parts of the
 heap in vacuum.

I would suggest that you use just an additional heap with decoupled
visibility fields as DSM.

For a large number of usage scenarios this will be highly compressible
and will mostly stay in processor caches .

You can start slow, and have the info duplicated in both main heap and 
visibility heap (aka DSM).

There are several advantages to keeping a separate visibility heap:

1) it is usually higly compressible, at least you can throw away
cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.

2) faster access, more tightly packed data pages.

3) index-only scans

4) superfast VACUUM FREEZE 

5) makes VACUUM faster even for worst cases (interleaving live and dead
tuples)

6) any index scan will be faster due to fetching only visible rows from
main heap.

   Even when all the tuples are in memory, index only scans are
  almost 40-60% faster than the index scans with thin indexes.
 
 Have you actually benchmarked that? What kind of query was that? I don't
 believe for a second that fetching the heap tuple when the page is in
 memory accounts for 40-60% of the overhead of regular index scans.

It depends heavily on the type of memory (postgresql page or disk cache)
it is in.

I remember doing Slony sobscribes in early days, and the speed
difference on loading a table with active PK index was several times,
depending on shared_buffers setting. 

That was for a table, where both heap and index did fit in the 2G memory
which was available, the difference being only shuffling the pages
between postgresql buffer and linux system cache or not.

 BTW, another issue you'll have to tackle, that a DSM-based patch will
 have to solve as well, is how to return tuples from an index. In b-tree,
 we scan pages page at a time, keeping a list of all tids that match the
 scanquals in BTScanOpaque. If we need to return not only the tids of the
 matching tuples, but the tuples as well, where do we store them? You
 could make a palloc'd copy of them all, but that seems quite expensive.

Have you considered returning them as already visibility-checked pages
similar to what views or set-returning functions return ?

---
Hannu





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Hannu Krosing [EMAIL PROTECTED] wrote:

 Ühel kenal päeval, T, 2007-10-23 kell 13:04, kirjutas Heikki
 Linnakangas:
  Gokulakannan Somasundaram wrote:
   Say, with a normal index, you need to goto the table for checking the
   snapshot. So you would be loading both the index pages + table pages,
 in
   order to satisfy a certain operations. Whereas in thick index you
 occupy 16
   bytes per tuple more in order to avoid going to the table. So memory
   management is again better. But i can run the load test, if that's
   required.
 
  Yes, performance testing is required for any performance-related patch.
 
  Remember that you're competing against DSM. We're going to want some
  kind of a DSM anyway because it allows skipping unmodified parts of the
  heap in vacuum.

 I would suggest that you use just an additional heap with decoupled
 visibility fields as DSM.

 For a large number of usage scenarios this will be highly compressible
 and will mostly stay in processor caches .

 You can start slow, and have the info duplicated in both main heap and
 visibility heap (aka DSM).

 There are several advantages to keeping a separate visibility heap:

 1) it is usually higly compressible, at least you can throw away
 cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.

 2) faster access, more tightly packed data pages.

 3) index-only scans

 4) superfast VACUUM FREEZE

 5) makes VACUUM faster even for worst cases (interleaving live and dead
 tuples)

 6) any index scan will be faster due to fetching only visible rows from
 main heap.


if you have to store the visibility fields of all the tuples of each table,
then you may not be able to accomodate in the cache. Say if a table is  of 1
million rows, we would need 22 MB of  visibility space(since visibility info
takes 16 bytes. I think if we have to link it with say tuple-id(6 Bytes). I
think we may need to link it with indexes with one more id. i am not
counting that now).  If we have  10 tables, then we will have 220 MB.
Keeping them pinned in memory may not be advisable in some circumstances. If
it is not going to be in memory, then that is no different from referring a
table. But i accept that is a concept worth trying out. I think the
advantage with thick indexes comes with the fact, that it is optional. If we
can make this also as optional, that would be better.
But if we are going to suggest it as a replacement of DSM, then it loses the
advantage of being small.

   Even when all the tuples are in memory, index only scans are
   almost 40-60% faster than the index scans with thin indexes.
 
  Have you actually benchmarked that? What kind of query was that? I don't
  believe for a second that fetching the heap tuple when the page is in
  memory accounts for 40-60% of the overhead of regular index scans.

 It depends heavily on the type of memory (postgresql page or disk cache)
 it is in.

 I remember doing Slony sobscribes in early days, and the speed
 difference on loading a table with active PK index was several times,
 depending on shared_buffers setting.

 That was for a table, where both heap and index did fit in the 2G memory
 which was available, the difference being only shuffling the pages
 between postgresql buffer and linux system cache or not.

  BTW, another issue you'll have to tackle, that a DSM-based patch will
  have to solve as well, is how to return tuples from an index. In b-tree,
  we scan pages page at a time, keeping a list of all tids that match the
  scanquals in BTScanOpaque. If we need to return not only the tids of the
  matching tuples, but the tuples as well, where do we store them? You
  could make a palloc'd copy of them all, but that seems quite expensive.

 Have you considered returning them as already visibility-checked pages
 similar to what views or set-returning functions return ?

 ---
 Hannu







-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Heikki Linnakangas
Hannu Krosing wrote:
 I would suggest that you use just an additional heap with decoupled
 visibility fields as DSM.

Yeah, I remember you've suggested that before, and I haven't responded
this far. The problems I see with that approach are:

1) How do you know which visibility info corresponds which heap tuple?
You'd need to have a pointer from the visibility info to the heap tuple,
and from the heap tuple to the visibility info. Which increases the
total (uncompressed) storage size.

2) If the visibility info / heap ordering isn't the same, seqscans need
to do random I/O.

3) If you need to do regular index scans, you're going to have to access
the index, the heap and the visibility info separately, and in that
order. That sounds expensive.

4) It's a big and complex change.

The significance of 2 and 3 depends a lot on how much of the visibility
information is in cache.

 For a large number of usage scenarios this will be highly compressible
 and will mostly stay in processor caches .

This seems to be where the potential gains are coming from in this
scheme. It boils down to how much compression you can do, and how
expensive it is to access the information in compressed form.

 1) it is usually higly compressible, at least you can throw away
 cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.

If you RLE compress the data, you'll need to figure out what to do when
you need update a field and it doesn't compress as well anymore. You
might have to move things around pages, so you'll have to update any
pointers to that information atomically.

 2) faster access, more tightly packed data pages.

But you do need to access the visibility information as well, at least
on tuples that match the query.

 5) makes VACUUM faster even for worst cases (interleaving live and dead
 tuples)

Does it? You still need to go to the heap pages to actually remove the
dead tuples. I suppose you could skip that and do it the first time you
access the page, like we do pruning with HOT.

 6) any index scan will be faster due to fetching only visible rows from
 main heap.

Assuming the visibility information is already in cache, and that
there's enough non-visible tuples for that to matter.

 BTW, another issue you'll have to tackle, that a DSM-based patch will
 have to solve as well, is how to return tuples from an index. In b-tree,
 we scan pages page at a time, keeping a list of all tids that match the
 scanquals in BTScanOpaque. If we need to return not only the tids of the
 matching tuples, but the tuples as well, where do we store them? You
 could make a palloc'd copy of them all, but that seems quite expensive.
 
 Have you considered returning them as already visibility-checked pages
 similar to what views or set-returning functions return ?

Sorry, I don't understand what you mean by that.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-10-23 kell 18:36, kirjutas Gokulakannan
Somasundaram:

 
 There are several advantages to keeping a separate visibility
 heap:
 
 1) it is usually higly compressible, at least you can throw
 away 
 cmin/cmax quite soon, usually also FREEZE and RLE encode the
 rest.
 
 2) faster access, more tightly packed data pages.
 
 3) index-only scans
 
 4) superfast VACUUM FREEZE
 
 5) makes VACUUM faster even for worst cases (interleaving live
 and dead 
 tuples)
 
 6) any index scan will be faster due to fetching only visible
 rows from
 main heap.
 
 if you have to store the visibility fields of all the tuples of each
 table, then you may not be able to accomodate in the cache. Say if a
 table is  of 1 million rows, we would need 22 MB of  visibility
 space(since visibility info takes 16 bytes. I think if we have to link
 it with say tuple-id(6 Bytes). 

You can keep the visibility info small, by first dropping cmin/cmax and
then FREEZ'ing the tuples (setting xmin to special value), after that
you can replace a lot of visibility info tuples with single RLE encoded
tuple, which simply states, that tuples N:A to M:B are visible.

If that 1 million row table is mostly static, the static parts will soon
have (al lot) less than 1 bit in visibility heap.

For example, after vacuum there will be just one visibility info which
say that whole table is visible.

I envision HOT-like on-the-fly VACUUM FREEZE manipulations of visibility
info so it won't grow very big at all.

 I think we may need to link it with indexes with one more id. i am not
 counting that now). 

why ?

we will keep visibility info for ctids (PAGE:NR) and if we need to see,
if any ctid pointe from index points to a visible tuple we check it
based on that ctid.

  If we have  10 tables, then we will have 220 MB.  Keeping them pinned
 in memory may not be advisable in some circumstances. 

no no! no pinning, the mostly in cache will happen automatically (and
I mean mostly in processors _internal_ L1 or L2 cache, not just in RAM)

 If it is not going to be in memory, then that is no different from
 referring a table. But i accept that is a concept worth trying out. I
 think the advantage with thick indexes comes with the fact, that it is
 optional. If we can make this also as optional, that would be better. 
 But if we are going to suggest it as a replacement of DSM, then it
 loses the advantage of being small. 

I agree that a single-purpose DSM can be made smaller than multi-purpose
visibility heap.


--
Hannu



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-10-23 kell 14:16, kirjutas Heikki
Linnakangas:
 Hannu Krosing wrote:
  I would suggest that you use just an additional heap with decoupled
  visibility fields as DSM.
 
 Yeah, I remember you've suggested that before, and I haven't responded
 this far. The problems I see with that approach are:

 1) How do you know which visibility info corresponds which heap tuple?
 You'd need to have a pointer from the visibility info to the heap tuple,
 and from the heap tuple to the visibility info. Which increases the
 total (uncompressed) storage size.

only the first , and not a pointer mostly , but just a fast lookup
function IsVisible(ctid, snapshot)

 2) If the visibility info / heap ordering isn't the same, seqscans need
 to do random I/O.

Visibility should be a tree-like structure, with
wholly-(visible/invisible) inner pages removed, stored in seqscan order.

seqscans will get a few (or few hundred) pages worth of sequential info
and then fetch only visible tuples from heap, then repeat

 3) If you need to do regular index scans, you're going to have to access
 the index, the heap and the visibility info separately, and in that
 order. That sounds expensive.

I'd do it in order 1) index 2) visibility 3) heap

The main assumption is, that visibility info is highly compressed --
stays in processor cache most of the time -- virtually free to access

 4) It's a big and complex change.

We can go in smaller steps, first adding it as a DSM replacement and as
a complement to visibility info in heap and only when we are sure that
we have gotten it right (by checking that there is always valid info in
visibility heap) will we move other internal stuff to using it, one by
one.


 The significance of 2 and 3 depends a lot on how much of the visibility
 information is in cache.
 
  For a large number of usage scenarios this will be highly compressible
  and will mostly stay in processor caches .
 
 This seems to be where the potential gains are coming from in this
 scheme. It boils down to how much compression you can do, and how
 expensive it is to access the information in compressed form.

Hopefully much cheaper than in uncompressed form, the best case being
when the table is static and you do an in-L1-cache lookup on VizHeap
root page

  1) it is usually higly compressible, at least you can throw away
  cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.
 
 If you RLE compress the data, you'll need to figure out what to do when
 you need update a field and it doesn't compress as well anymore. You
 might have to move things around pages, so you'll have to update any
 pointers to that information atomically.

I envision each RLE range to have an exception list, which can be
appended to atomically with no locks. only when the list is long enough
will there be locking and rearranging.

  2) faster access, more tightly packed data pages.
 
 But you do need to access the visibility information as well, at least
 on tuples that match the query.

usually you even access visbility first, then data (if visible).

  5) makes VACUUM faster even for worst cases (interleaving live and dead
  tuples)
 
 Does it? You still need to go to the heap pages to actually remove the
 dead tuples. I suppose you could skip that and do it the first time you
 access the page, like we do pruning with HOT.

Possibly for heap. But not for indexes which have to be cleaned up
during vacuum. At least I can't see how we could do it later.

OTOH, we can have very fast VACUUM FREEZE ONLY command, which will run
on just visibility heap and FREEZE all visible-to-all tuples and
compress them. It should also mark all aborted or
deleted-and-invisible-to-all to a separate value so they too compress
better.

  6) any index scan will be faster due to fetching only visible rows from
  main heap.
 
 Assuming the visibility information is already in cache, and that
 there's enough non-visible tuples for that to matter.

If there is not enough, then the visibility info is likely very highly
compressible and thus cheap to access.


Hannu



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Hannu Krosing [EMAIL PROTECTED] wrote:

 Ühel kenal päeval, T, 2007-10-23 kell 18:36, kirjutas Gokulakannan
 Somasundaram:

 
  There are several advantages to keeping a separate visibility
  heap:
 
  1) it is usually higly compressible, at least you can throw
  away
  cmin/cmax quite soon, usually also FREEZE and RLE encode the
  rest.
 
  2) faster access, more tightly packed data pages.
 
  3) index-only scans
 
  4) superfast VACUUM FREEZE
 
  5) makes VACUUM faster even for worst cases (interleaving live
  and dead
  tuples)
 
  6) any index scan will be faster due to fetching only visible
  rows from
  main heap.
 
  if you have to store the visibility fields of all the tuples of each
  table, then you may not be able to accomodate in the cache. Say if a
  table is  of 1 million rows, we would need 22 MB of  visibility
  space(since visibility info takes 16 bytes. I think if we have to link
  it with say tuple-id(6 Bytes).

 You can keep the visibility info small, by first dropping cmin/cmax and
 then FREEZ'ing the tuples (setting xmin to special value), after that
 you can replace a lot of visibility info tuples with single RLE encoded
 tuple, which simply states, that tuples N:A to M:B are visible.


I think i am missing something here. say if we have a tuple to your
definition. Initially it has cmin/cmax and then you drop it. Is it a
in-place update? how will you reclaim that space, if it is a in-place
update?
If we set N:A to M:B are visible, then suppose some tuple in between is
deleted, then we need to write the info in a different format. Till that
whole update happens, lot of transactions will be waiting to acquire the
lock on the same visibility info block. i feel that may again lead to the
same concurrency issues.

If that 1 million row table is mostly static, the static parts will soon
 have (al lot) less than 1 bit in visibility heap.

 For example, after vacuum there will be just one visibility info which
 say that whole table is visible.

 I envision HOT-like on-the-fly VACUUM FREEZE manipulations of visibility
 info so it won't grow very big at all.


If the tables are static, then DSM becomes the best solution, may be if you
are storing one bit per table, then yours become the best solution.


 I think we may need to link it with indexes with one more id. i am not
  counting that now).

 why ?


If we are going to store something like a range of tuples are visible, how
we will reach that particular info. Don't we need a pointer to reach that
memory block.

we will keep visibility info for ctids (PAGE:NR) and if we need to see,
 if any ctid pointe from index points to a visible tuple we check it
 based on that ctid.


Oh then you occupy space proportional to the number of tuples.  it would be
like a hash map, mapping ctids to the information. so if we have a
information like M:A to N:B are visible, then should we be placing it
against each ctid?

  If we have  10 tables, then we will have 220 MB.  Keeping them pinned
  in memory may not be advisable in some circumstances.

 no no! no pinning, the mostly in cache will happen automatically (and
 I mean mostly in processors _internal_ L1 or L2 cache, not just in RAM)


L1 and L2 data caches. hmmm. i think the basic problem with visibility is if
you make it too small, then updates get costly in terms of concurrency.

 If it is not going to be in memory, then that is no different from
  referring a table. But i accept that is a concept worth trying out. I
  think the advantage with thick indexes comes with the fact, that it is
  optional. If we can make this also as optional, that would be better.
  But if we are going to suggest it as a replacement of DSM, then it
  loses the advantage of being small.

 I agree that a single-purpose DSM can be made smaller than multi-purpose
 visibility heap.







-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)