Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2008-04-10 Thread Bruce Momjian

I have added URLs to your patch to the TODO list:

* Allow data to be pulled directly from indexes

---

Gokulakannan Somasundaram wrote:
> Hi,
> I would like to present the first patch. It currently has the following
> restrictions
> a) It does not support any functional indexes.
> b) It supports queries like select count(1) from table where (restrictions
> from indexed columns), but it does not support select count(1) from table.
> 
> The Syntax to create this type of index is
> 
> create thick index idx on dd(n1,n2)
> 
> here idx- index name and dd- table name and n1 and n2 are column names.
> 
> I have created a extra column in pg_index called indhassnapshot.
> 
> I have also enabled the display of Logical Reads. In order to see that, set
> log_statement_stats on.
> 
> The thick index is clearly on the front, if you issue queries like
> 
> select n2 from dd where n1>1000 and n2<1500;
> 
> As already said, if the update is not incurring any extra cost, except if
> the indexed columns are updated. Deletes are costly, making it ideal for
> partitioned tables.
> 
> In order to update the thick indexes, i have accessed the ps_ExprContext in
> PlanState to get the oldtuple. But if we have a outer plan and inner plan,
> then i have set the ps_ExprContext of  innerplan to the outerplan. I don't
> know whether there will be instances where the ps_ExprContext of outerplan
> node will have some use in update queries.
> 
> Right now, it passes the regression test suite. I had slight trouble with
> pg_indent, so i think it has not got applied properly. But i have tried to
> remove all the whitespace differences. Please be kind to me in case i have
> missed any whitespace differences. :)
> 
> Please review the patch and provide your comments.
> 
> Thanks,
> Gokul.
> CertoSQL Project,
> Allied Solution Groups.
> (www.alliedgroups.com)
> 
> On 10/23/07, Hannu Krosing <[EMAIL PROTECTED]> wrote:
> >
> > ?hel kenal p?eval, L, 2007-10-20 kell 10:19, kirjutas Luke Lonergan:
> > > Hi Hannu,
> > >
> > > On 10/14/07 12:58 AM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote:
> > >
> > > > What has happened in reality, is that the speed difference between
> > CPU,
> > > > RAM and disk speeds has _increased_ tremendously
> > >
> > > Yes.
> > >
> > > > which makes it even
> > > > more important to _decrease_ the size of stored data if you want good
> > > > performance
> > >
> > > Or bring the cpu processing closer to the data it's using (or both).
> > >
> > > By default, the trend you mention first will continue in an unending way
> > -
> > > the consequence is that the "distance" between a processor and it's
> > target
> > > data will continue to increase ad-infinitum.
> >
> > the emergence of solid-state (flash) disks may help a little here, but
> > in general it is true.
> >
> > > By contrast, you can only decrease the data volume so much - so in the
> > end
> > > you'll be left with the same problem - the data needs to be closer to
> > the
> > > processing.  This is the essence of parallel / shared nothing
> > architecture.
> > >
> > > Note that we've done this at Greenplum.  We're also implementing a
> > DSM-like
> > > capability and are investigating a couple of different hybrid row /
> > column
> > > store approaches.
> >
> > Have you tried moving the whole visibility part of tuples out to a
> > separate heap ?
> >
> > Especially in OLAP/ETL scenarios the distribution of tuples loaded in
> > one transaction should be very good for visibility-info compression.
> >
> > I'd suspect that you could crush hundreds of pages worth of visibility
> > into single RLE encoding unit (xmin=N, xmax=no_yet, start_ctid = X,
> > end_ctid=Y), and it will stay in L1 cache most of the time you process
> > the corresponding relation. and the relation itself will be smaller, and
> > index-only (actually index-only + lookup inside L1 cache) access can
> > happen, and so on .
> >
> > OTOH, if you load it in millions of small transactions, you can run
> > VACUUM FREEZE _on_ the visibility heap only, which will make all
> > visibility infoe look similar and thus RLE-compressable and again make
> > it fit in L1 cache, if you dont have lots of failed loads interleaved
> > with successful ones.
> >
> > > Bitmap index with index-only access does provide nearly all of the
> > > advantages of a column store from a speed standpoint BTW.  Even though
> > > Vertica is touting speed advantages - our parallel engine plus bitmap
> > index
> > > will crush them in benchmarks when they show up with real code.
> > >
> > > Meanwhile they're moving on to new ideas - I kid you not "Horizontica"
> > is
> > > Dr. Stonebraker's new idea :-)
> >
> > Sounds like a result of a marketroid brainstorming session :P
> >
> > > So - bottom line - some ideas from column store make sense, but it's not
> > a
> > > cure-all.
> > >
> > > > There is also a MonetDB/X100 project, which tries to make MonetOD
>

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2008-01-28 Thread Jonah H. Harris
On Jan 28, 2008 8:21 AM, Gokulakannan Somasundaram <[EMAIL PROTECTED]> wrote:
> I am not seeing my mail getting listed in the archives. So i am just
> resending it, in case the above one has got missed.

It was sent.  Archive processing is delayed.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-28 Thread Gokulakannan Somasundaram
On 10/28/07, Hannu Krosing <[EMAIL PROTECTED]> wrote:
>
> Ühel kenal päeval, R, 2007-10-26 kell 16:46, kirjutas Gokulakannan
> Somasundaram:
> > What does the numbers look like if the the tables are small
> > enough to
> > fit in RAM?
> >
> > I don't know whether this is a valid production setup, against which
> > we need to benchmark.
>
> Often the production setup may have at least most of indexes in RAM, if
> not the whole data.


My test happened to be something where the base folder size was less than
RAM. So this test case has been covered, although without my intention.

Thanks,
Gokul.

--
> Hannu
>
>
>
>


-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-27 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-10-26 kell 16:46, kirjutas Gokulakannan
Somasundaram:
> What does the numbers look like if the the tables are small
> enough to
> fit in RAM?
> 
> I don't know whether this is a valid production setup, against which
> we need to benchmark. 

Often the production setup may have at least most of indexes in RAM, if
not the whole data.

--
Hannu




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Gokulakannan Somasundaram
On 10/26/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
> Gokulakannan Somasundaram wrote:
> > On 10/26/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
> >> Gokulakannan Somasundaram wrote:
> >>> As far as Load Test is concerned, i have tried to provide all the
> >> relevant
> >>> details. Please inform me, if i have left any.
> >> Thanks!
> >>
> >> How large were the tables?
> >
> > It is in the Performance test report.  They contain 2 million
> records.  6
> > columns wide, 3 text and 3 numeric. same set of tables used for both
> tests,
> > after refresh from a file
>
> I meant in megabytes. How wide is the data in the text and numeric fields?



I have observed the size of PGDATA\base folder for size details
Size of Tables : 367 MB
Size of Tables + thin indexes :  616 MB
Size of Tables + thick indexes : 720 MB

The numbers were simply  running between 1 and 2 million in a serial
fashion. I think i made a mistake here. this would have helped thin indexes
in range scans, since the data is clustered at the table, the bitmap heap
scan would have been more effective. So i hope thick indexes will be more
effective, if  uncluster the data, since the thin index has to goto more
table buffers. The test columns are approx 10 characters in length.



> Did you run all the queries concurrently? At this point, I think it'd be
> >> better to run them separately so that you can look at the impact on
> each
> >> kind of operation in isolation.
> >>
> > Performance tests are run against a workload and i have taken the
> workload
> > of a small scale partitioning setup. Running the queries individually
> has
> > already been done and the count of logical reads have been verified. I
> have
> > already suggested that. For some reason, i am not able to convince that
> for
> > simple index scans, Logical reads are a good measure of performance.
>
> I wouldn't expect any performance gain for simple, not index-only,
> scans. They have to hit the heap anyway.


I just feel the above test didn't do much I/Os and yet the index only scans
are faster with thick indexes. since the size of RAM is 1GB and the size of
the data is only 616MB, i hope most of them might have been OS cached. May
be i am missing something here.


> What does the numbers look like if the the tables are small enough to
> >> fit in RAM?
> >
> > I don't know whether this is a valid production setup, against which we
> need
> > to benchmark. But if you insist, i will do that and get back to you next
> > time.
>
> A lot of people run databases that fit in RAM. And a lot of people
> don't. Both cases are interesting. I'm particularly curious about that
> because you've argued that the number of logical reads is important,
> even if they don't become physical reads. Hannu also suggested that
> swapping pages in/out of shared_buffers is relatively expensive; if
> that's the case, we should see index-only scans performing much better
> regular index scans, even when there's no physical I/O.


So the above test has fit into the RAM. Now do we need a test with tables
that won't fit into RAM. i feel if the thick indexes were effective with
data that would fit into RAM, then it will definitely be more effective with
data that wouldn't fit into RAM. There is one performance bug, with updates
where the caching strategy for BTStack didn't go effective for the Varlena
structures. i will fix that bug next time. Also calls to HOT related stuff
can be avoided, if it happens to be a thick index, I think these two
changes, if made would further improve the performance of thick indexes.


> You should do some tuning, the PostgreSQL default configuration is not
> >> tuned for maximum performance. At least increase checkpoint_segments
> and
> >> checkpoint_timeout and shared_buffers. Though I noticed that you're
> >> running on Windows; I don't think anyone's done any serious performance
> >> testing or tuning on Windows yet, so I'm not sure how you should tune
> >> that.
> >
> > What we are trying to do here, is to try and compare the performance of
> two
> > indexing structures. AFAIK, the performance test done to compare two
> > software implementations should not have parameter settings, favorable
> to
> > one. I have not done any settings change favorable to thick index.
>
> The tuning I suggested is just basic tuning any knowledgeable Postgres
> DBA will do. It's not particularly in favor of any indexing scheme. With
> the default checkpoint settings, for example, the system is going to be
> busy doing checkpoints all the time if you have a reasonable rate of
> updates.


The inserts and updates were at the rate of 10 every 2 seconds (there in the
performance report) and the update was affecting two rows. I i haven't got
any warning to increase the checkpoint during the test.
But my doubt is  if checkpoint has caused so much of overhead, as we think
of, how can the performance of thick indexes exceed thin indexes in index
only scans?
As you might have observed all the s

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote:
> On 10/26/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>> Gokulakannan Somasundaram wrote:
>>> As far as Load Test is concerned, i have tried to provide all the
>> relevant
>>> details. Please inform me, if i have left any.
>> Thanks!
>>
>> How large were the tables?
> 
> It is in the Performance test report.  They contain 2 million records.  6
> columns wide, 3 text and 3 numeric. same set of tables used for both tests,
> after refresh from a file

I meant in megabytes. How wide is the data in the text and numeric fields?

> Did you run all the queries concurrently? At this point, I think it'd be
>> better to run them separately so that you can look at the impact on each
>> kind of operation in isolation.
>>
> Performance tests are run against a workload and i have taken the workload
> of a small scale partitioning setup. Running the queries individually has
> already been done and the count of logical reads have been verified. I have
> already suggested that. For some reason, i am not able to convince that for
> simple index scans, Logical reads are a good measure of performance.

I wouldn't expect any performance gain for simple, not index-only,
scans. They have to hit the heap anyway.

> What does the numbers look like if the the tables are small enough to
>> fit in RAM?
> 
> I don't know whether this is a valid production setup, against which we need
> to benchmark. But if you insist, i will do that and get back to you next
> time.

A lot of people run databases that fit in RAM. And a lot of people
don't. Both cases are interesting. I'm particularly curious about that
because you've argued that the number of logical reads is important,
even if they don't become physical reads. Hannu also suggested that
swapping pages in/out of shared_buffers is relatively expensive; if
that's the case, we should see index-only scans performing much better
regular index scans, even when there's no physical I/O.

> You should do some tuning, the PostgreSQL default configuration is not
>> tuned for maximum performance. At least increase checkpoint_segments and
>> checkpoint_timeout and shared_buffers. Though I noticed that you're
>> running on Windows; I don't think anyone's done any serious performance
>> testing or tuning on Windows yet, so I'm not sure how you should tune
>> that.
> 
> What we are trying to do here, is to try and compare the performance of two
> indexing structures. AFAIK, the performance test done to compare two
> software implementations should not have parameter settings, favorable to
> one. I have not done any settings change favorable to thick index. 

The tuning I suggested is just basic tuning any knowledgeable Postgres
DBA will do. It's not particularly in favor of any indexing scheme. With
the default checkpoint settings, for example, the system is going to be
busy doing checkpoints all the time if you have a reasonable rate of
updates.

> But i
> have a limited setup, from which i am trying to contribute. So please don't
> ask me to run the tests against large scale servers.

Understood.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Gokulakannan Somasundaram
On 10/26/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
> Gokulakannan Somasundaram wrote:
> > As far as Load Test is concerned, i have tried to provide all the
> relevant
> > details. Please inform me, if i have left any.
>
> Thanks!
>
> How large were the tables?


It is in the Performance test report.  They contain 2 million records.  6
columns wide, 3 text and 3 numeric. same set of tables used for both tests,
after refresh from a file

Did you run all the queries concurrently? At this point, I think it'd be
> better to run them separately so that you can look at the impact on each
> kind of operation in isolation.
>
Performance tests are run against a workload and i have taken the workload
of a small scale partitioning setup. Running the queries individually has
already been done and the count of logical reads have been verified. I have
already suggested that. For some reason, i am not able to convince that for
simple index scans, Logical reads are a good measure of performance.


What kind of an I/O system does the server have?


Its a normal desktop system. The model no. is ST3400633A, 7200 RPM


It'd be interesting to get the cache hit/miss ratios, as well as the
> output of iostat (or similar) during the test. How much of the benefit
> is due to reduced random I/O?

Good suggestion. i have run the test against Windows. Let me try perfmon in
the next performance test, to monitor the performance test.


What does the numbers look like if the the tables are small enough to
> fit in RAM?


I don't know whether this is a valid production setup, against which we need
to benchmark. But if you insist, i will do that and get back to you next
time.


You should do some tuning, the PostgreSQL default configuration is not
> tuned for maximum performance. At least increase checkpoint_segments and
> checkpoint_timeout and shared_buffers. Though I noticed that you're
> running on Windows; I don't think anyone's done any serious performance
> testing or tuning on Windows yet, so I'm not sure how you should tune
> that.


What we are trying to do here, is to try and compare the performance of two
indexing structures. AFAIK, the performance test done to compare two
software implementations should not have parameter settings, favorable to
one. I have not done any settings change favorable to thick index. But i
have a limited setup, from which i am trying to contribute. So please don't
ask me to run the tests against large scale servers.

 I think a better idea would be to form a Performance testing Workload mix (
Taking into account the QoS Parameters used in the normal database, purging
frequency, typical workload models used in the industry), with freedom in
hardware/software can be drawn. That might solve some of the Load test
riddles.



-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote:
> As far as Load Test is concerned, i have tried to provide all the relevant
> details. Please inform me, if i have left any.

Thanks!

How large were the tables?

Did you run all the queries concurrently? At this point, I think it'd be
better to run them separately so that you can look at the impact on each
kind of operation in isolation.

What kind of an I/O system does the server have?

It'd be interesting to get the cache hit/miss ratios, as well as the
output of iostat (or similar) during the test. How much of the benefit
is due to reduced random I/O?

What does the numbers look like if the the tables are small enough to
fit in RAM?

You should do some tuning, the PostgreSQL default configuration is not
tuned for maximum performance. At least increase checkpoint_segments and
checkpoint_timeout and shared_buffers. Though I noticed that you're
running on Windows; I don't think anyone's done any serious performance
testing or tuning on Windows yet, so I'm not sure how you should tune that.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-24 Thread Bruce Momjian

This has been saved for consideration for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Gokulakannan Somasundaram wrote:
> Hi,
> I would like to present the first patch. It currently has the following
> restrictions
> a) It does not support any functional indexes.
> b) It supports queries like select count(1) from table where (restrictions
> from indexed columns), but it does not support select count(1) from table.
> 
> The Syntax to create this type of index is
> 
> create thick index idx on dd(n1,n2)
> 
> here idx- index name and dd- table name and n1 and n2 are column names.
> 
> I have created a extra column in pg_index called indhassnapshot.
> 
> I have also enabled the display of Logical Reads. In order to see that, set
> log_statement_stats on.
> 
> The thick index is clearly on the front, if you issue queries like
> 
> select n2 from dd where n1>1000 and n2<1500;
> 
> As already said, if the update is not incurring any extra cost, except if
> the indexed columns are updated. Deletes are costly, making it ideal for
> partitioned tables.
> 
> In order to update the thick indexes, i have accessed the ps_ExprContext in
> PlanState to get the oldtuple. But if we have a outer plan and inner plan,
> then i have set the ps_ExprContext of  innerplan to the outerplan. I don't
> know whether there will be instances where the ps_ExprContext of outerplan
> node will have some use in update queries.
> 
> Right now, it passes the regression test suite. I had slight trouble with
> pg_indent, so i think it has not got applied properly. But i have tried to
> remove all the whitespace differences. Please be kind to me in case i have
> missed any whitespace differences. :)
> 
> Please review the patch and provide your comments.
> 
> Thanks,
> Gokul.
> CertoSQL Project,
> Allied Solution Groups.
> (www.alliedgroups.com)
> 
> On 10/23/07, Hannu Krosing <[EMAIL PROTECTED]> wrote:
> >
> > ?hel kenal p?eval, L, 2007-10-20 kell 10:19, kirjutas Luke Lonergan:
> > > Hi Hannu,
> > >
> > > On 10/14/07 12:58 AM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote:
> > >
> > > > What has happened in reality, is that the speed difference between
> > CPU,
> > > > RAM and disk speeds has _increased_ tremendously
> > >
> > > Yes.
> > >
> > > > which makes it even
> > > > more important to _decrease_ the size of stored data if you want good
> > > > performance
> > >
> > > Or bring the cpu processing closer to the data it's using (or both).
> > >
> > > By default, the trend you mention first will continue in an unending way
> > -
> > > the consequence is that the "distance" between a processor and it's
> > target
> > > data will continue to increase ad-infinitum.
> >
> > the emergence of solid-state (flash) disks may help a little here, but
> > in general it is true.
> >
> > > By contrast, you can only decrease the data volume so much - so in the
> > end
> > > you'll be left with the same problem - the data needs to be closer to
> > the
> > > processing.  This is the essence of parallel / shared nothing
> > architecture.
> > >
> > > Note that we've done this at Greenplum.  We're also implementing a
> > DSM-like
> > > capability and are investigating a couple of different hybrid row /
> > column
> > > store approaches.
> >
> > Have you tried moving the whole visibility part of tuples out to a
> > separate heap ?
> >
> > Especially in OLAP/ETL scenarios the distribution of tuples loaded in
> > one transaction should be very good for visibility-info compression.
> >
> > I'd suspect that you could crush hundreds of pages worth of visibility
> > into single RLE encoding unit (xmin=N, xmax=no_yet, start_ctid = X,
> > end_ctid=Y), and it will stay in L1 cache most of the time you process
> > the corresponding relation. and the relation itself will be smaller, and
> > index-only (actually index-only + lookup inside L1 cache) access can
> > happen, and so on .
> >
> > OTOH, if you load it in millions of small transactions, you can run
> > VACUUM FREEZE _on_ the visibility heap only, which will make all
> > visibility infoe look similar and thus RLE-compressable and again make
> > it fit in L1 cache, if you dont have lots of failed loads interleaved
> > with successful ones.
> >
> > > Bitmap index with index-only access does provide nearly all of the
> > > advantages of a column store from a speed standpoint BTW.  Even though
> > > Vertica is touting speed advantages - our parallel engine plus bitmap
> > index
> > > will crush them in benchmarks when they show up with real code.
> > >
> > > Meanwhile they're moving on to new ideas - I kid you not "Horizontica"
> > is
> > > Dr. Stonebraker's new idea :-)
> >
> > Sounds like a result of a marketroid brainstorming session :P
> >
> > > So - bottom line - some ideas from column store make sense, but it's not
> > a
> > > cure-all.
> > >
> > > > There is also a MonetDB/X100 project, which tries to 

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Hannu Krosing <[EMAIL PROTECTED]> wrote:
>
> Ühel kenal päeval, T, 2007-10-23 kell 18:36, kirjutas Gokulakannan
> Somasundaram:
>
> >
> > There are several advantages to keeping a separate visibility
> > heap:
> >
> > 1) it is usually higly compressible, at least you can throw
> > away
> > cmin/cmax quite soon, usually also FREEZE and RLE encode the
> > rest.
> >
> > 2) faster access, more tightly packed data pages.
> >
> > 3) index-only scans
> >
> > 4) superfast VACUUM FREEZE
> >
> > 5) makes VACUUM faster even for worst cases (interleaving live
> > and dead
> > tuples)
> >
> > 6) any index scan will be faster due to fetching only visible
> > rows from
> > main heap.
> >
> > if you have to store the visibility fields of all the tuples of each
> > table, then you may not be able to accomodate in the cache. Say if a
> > table is  of 1 million rows, we would need 22 MB of  visibility
> > space(since visibility info takes 16 bytes. I think if we have to link
> > it with say tuple-id(6 Bytes).
>
> You can keep the visibility info small, by first dropping cmin/cmax and
> then FREEZ'ing the tuples (setting xmin to special value), after that
> you can replace a lot of visibility info tuples with single RLE encoded
> tuple, which simply states, that tuples N:A to M:B are visible.


I think i am missing something here. say if we have a tuple to your
definition. Initially it has cmin/cmax and then you drop it. Is it a
in-place update? how will you reclaim that space, if it is a in-place
update?
If we set N:A to M:B are visible, then suppose some tuple in between is
deleted, then we need to write the info in a different format. Till that
whole update happens, lot of transactions will be waiting to acquire the
lock on the same visibility info block. i feel that may again lead to the
same concurrency issues.

If that 1 million row table is mostly static, the static parts will soon
> have (al lot) less than 1 bit in visibility heap.
>
> For example, after vacuum there will be just one visibility info which
> say that whole table is visible.
>
> I envision HOT-like on-the-fly VACUUM FREEZE manipulations of visibility
> info so it won't grow very big at all.


If the tables are static, then DSM becomes the best solution, may be if you
are storing one bit per table, then yours become the best solution.


> I think we may need to link it with indexes with one more id. i am not
> > counting that now).
>
> why ?


If we are going to store something like a range of tuples are visible, how
we will reach that particular info. Don't we need a pointer to reach that
memory block.

we will keep visibility info for ctids (PAGE:NR) and if we need to see,
> if any ctid pointe from index points to a visible tuple we check it
> based on that ctid.


Oh then you occupy space proportional to the number of tuples.  it would be
like a hash map, mapping ctids to the information. so if we have a
information like M:A to N:B are visible, then should we be placing it
against each ctid?

>  If we have  10 tables, then we will have 220 MB.  Keeping them pinned
> > in memory may not be advisable in some circumstances.
>
> no no! no pinning, the "mostly in cache" will happen automatically (and
> I mean mostly in processors _internal_ L1 or L2 cache, not just in RAM)


L1 and L2 data caches. hmmm. i think the basic problem with visibility is if
you make it too small, then updates get costly in terms of concurrency.

> If it is not going to be in memory, then that is no different from
> > referring a table. But i accept that is a concept worth trying out. I
> > think the advantage with thick indexes comes with the fact, that it is
> > optional. If we can make this also as optional, that would be better.
> > But if we are going to suggest it as a replacement of DSM, then it
> > loses the advantage of being small.
>
> I agree that a single-purpose DSM can be made smaller than multi-purpose
> visibility heap.
>
>
>
>
>


-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-10-23 kell 14:16, kirjutas Heikki
Linnakangas:
> Hannu Krosing wrote:
> > I would suggest that you use just an additional heap with decoupled
> > visibility fields as DSM.
> 
> Yeah, I remember you've suggested that before, and I haven't responded
> this far. The problems I see with that approach are:
>
> 1) How do you know which visibility info corresponds which heap tuple?
> You'd need to have a pointer from the visibility info to the heap tuple,
> and from the heap tuple to the visibility info. Which increases the
> total (uncompressed) storage size.

only the first , and not a pointer mostly , but just a fast lookup
function IsVisible(ctid, snapshot)

> 2) If the visibility info / heap ordering isn't the same, seqscans need
> to do random I/O.

Visibility should be a tree-like structure, with
wholly-(visible/invisible) inner pages removed, stored in seqscan order.

seqscans will get a few (or few hundred) pages worth of sequential info
and then fetch only visible tuples from heap, then repeat

> 3) If you need to do regular index scans, you're going to have to access
> the index, the heap and the visibility info separately, and in that
> order. That sounds expensive.

I'd do it in order 1) index 2) visibility 3) heap

The main assumption is, that visibility info is highly compressed -->
stays in processor cache most of the time --> virtually free to access

> 4) It's a big and complex change.

We can go in smaller steps, first adding it as a DSM replacement and as
a complement to visibility info in heap and only when we are sure that
we have gotten it right (by checking that there is always valid info in
visibility heap) will we move other internal stuff to using it, one by
one.


> The significance of 2 and 3 depends a lot on how much of the visibility
> information is in cache.
> 
> > For a large number of usage scenarios this will be highly compressible
> > and will mostly stay in processor caches .
> 
> This seems to be where the potential gains are coming from in this
> scheme. It boils down to how much compression you can do, and how
> expensive it is to access the information in compressed form.

Hopefully much cheaper than in uncompressed form, the best case being
when the table is static and you do an in-L1-cache lookup on VizHeap
root page

> > 1) it is usually higly compressible, at least you can throw away
> > cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.
> 
> If you RLE compress the data, you'll need to figure out what to do when
> you need update a field and it doesn't compress as well anymore. You
> might have to move things around pages, so you'll have to update any
> pointers to that information atomically.

I envision each RLE range to have an exception list, which can be
appended to atomically with no locks. only when the list is long enough
will there be locking and rearranging.

> > 2) faster access, more tightly packed data pages.
> 
> But you do need to access the visibility information as well, at least
> on tuples that match the query.

usually you even access visbility first, then data (if visible).

> > 5) makes VACUUM faster even for worst cases (interleaving live and dead
> > tuples)
> 
> Does it? You still need to go to the heap pages to actually remove the
> dead tuples. I suppose you could skip that and do it the first time you
> access the page, like we do pruning with HOT.

Possibly for heap. But not for indexes which have to be cleaned up
during vacuum. At least I can't see how we could do it later.

OTOH, we can have very fast VACUUM FREEZE ONLY command, which will run
on just visibility heap and FREEZE all visible-to-all tuples and
compress them. It should also mark all aborted or
deleted-and-invisible-to-all to a separate value so they too compress
better.

> > 6) any index scan will be faster due to fetching only visible rows from
> > main heap.
> 
> Assuming the visibility information is already in cache, and that
> there's enough non-visible tuples for that to matter.

If there is not enough, then the visibility info is likely very highly
compressible and thus cheap to access.


Hannu



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-10-23 kell 18:36, kirjutas Gokulakannan
Somasundaram:

> 
> There are several advantages to keeping a separate visibility
> heap:
> 
> 1) it is usually higly compressible, at least you can throw
> away 
> cmin/cmax quite soon, usually also FREEZE and RLE encode the
> rest.
> 
> 2) faster access, more tightly packed data pages.
> 
> 3) index-only scans
> 
> 4) superfast VACUUM FREEZE
> 
> 5) makes VACUUM faster even for worst cases (interleaving live
> and dead 
> tuples)
> 
> 6) any index scan will be faster due to fetching only visible
> rows from
> main heap.
> 
> if you have to store the visibility fields of all the tuples of each
> table, then you may not be able to accomodate in the cache. Say if a
> table is  of 1 million rows, we would need 22 MB of  visibility
> space(since visibility info takes 16 bytes. I think if we have to link
> it with say tuple-id(6 Bytes). 

You can keep the visibility info small, by first dropping cmin/cmax and
then FREEZ'ing the tuples (setting xmin to special value), after that
you can replace a lot of visibility info tuples with single RLE encoded
tuple, which simply states, that tuples N:A to M:B are visible.

If that 1 million row table is mostly static, the static parts will soon
have (al lot) less than 1 bit in visibility heap.

For example, after vacuum there will be just one visibility info which
say that whole table is visible.

I envision HOT-like on-the-fly VACUUM FREEZE manipulations of visibility
info so it won't grow very big at all.

> I think we may need to link it with indexes with one more id. i am not
> counting that now). 

why ?

we will keep visibility info for ctids (PAGE:NR) and if we need to see,
if any ctid pointe from index points to a visible tuple we check it
based on that ctid.

>  If we have  10 tables, then we will have 220 MB.  Keeping them pinned
> in memory may not be advisable in some circumstances. 

no no! no pinning, the "mostly in cache" will happen automatically (and
I mean mostly in processors _internal_ L1 or L2 cache, not just in RAM)

> If it is not going to be in memory, then that is no different from
> referring a table. But i accept that is a concept worth trying out. I
> think the advantage with thick indexes comes with the fact, that it is
> optional. If we can make this also as optional, that would be better. 
> But if we are going to suggest it as a replacement of DSM, then it
> loses the advantage of being small. 

I agree that a single-purpose DSM can be made smaller than multi-purpose
visibility heap.


--
Hannu



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Heikki Linnakangas
Hannu Krosing wrote:
> I would suggest that you use just an additional heap with decoupled
> visibility fields as DSM.

Yeah, I remember you've suggested that before, and I haven't responded
this far. The problems I see with that approach are:

1) How do you know which visibility info corresponds which heap tuple?
You'd need to have a pointer from the visibility info to the heap tuple,
and from the heap tuple to the visibility info. Which increases the
total (uncompressed) storage size.

2) If the visibility info / heap ordering isn't the same, seqscans need
to do random I/O.

3) If you need to do regular index scans, you're going to have to access
the index, the heap and the visibility info separately, and in that
order. That sounds expensive.

4) It's a big and complex change.

The significance of 2 and 3 depends a lot on how much of the visibility
information is in cache.

> For a large number of usage scenarios this will be highly compressible
> and will mostly stay in processor caches .

This seems to be where the potential gains are coming from in this
scheme. It boils down to how much compression you can do, and how
expensive it is to access the information in compressed form.

> 1) it is usually higly compressible, at least you can throw away
> cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.

If you RLE compress the data, you'll need to figure out what to do when
you need update a field and it doesn't compress as well anymore. You
might have to move things around pages, so you'll have to update any
pointers to that information atomically.

> 2) faster access, more tightly packed data pages.

But you do need to access the visibility information as well, at least
on tuples that match the query.

> 5) makes VACUUM faster even for worst cases (interleaving live and dead
> tuples)

Does it? You still need to go to the heap pages to actually remove the
dead tuples. I suppose you could skip that and do it the first time you
access the page, like we do pruning with HOT.

> 6) any index scan will be faster due to fetching only visible rows from
> main heap.

Assuming the visibility information is already in cache, and that
there's enough non-visible tuples for that to matter.

>> BTW, another issue you'll have to tackle, that a DSM-based patch will
>> have to solve as well, is how to return tuples from an index. In b-tree,
>> we scan pages page at a time, keeping a list of all tids that match the
>> scanquals in BTScanOpaque. If we need to return not only the tids of the
>> matching tuples, but the tuples as well, where do we store them? You
>> could make a palloc'd copy of them all, but that seems quite expensive.
> 
> Have you considered returning them as "already visibility-checked pages"
> similar to what views or set-returning functions return ?

Sorry, I don't understand what you mean by that.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Hannu Krosing <[EMAIL PROTECTED]> wrote:
>
> Ühel kenal päeval, T, 2007-10-23 kell 13:04, kirjutas Heikki
> Linnakangas:
> > Gokulakannan Somasundaram wrote:
> > > Say, with a normal index, you need to goto the table for checking the
> > > snapshot. So you would be loading both the index pages + table pages,
> in
> > > order to satisfy a certain operations. Whereas in thick index you
> occupy 16
> > > bytes per tuple more in order to avoid going to the table. So memory
> > > management is again better. But i can run the load test, if that's
> > > required.
> >
> > Yes, performance testing is required for any performance-related patch.
> >
> > Remember that you're competing against DSM. We're going to want some
> > kind of a DSM anyway because it allows skipping unmodified parts of the
> > heap in vacuum.
>
> I would suggest that you use just an additional heap with decoupled
> visibility fields as DSM.
>
> For a large number of usage scenarios this will be highly compressible
> and will mostly stay in processor caches .
>
> You can start slow, and have the info duplicated in both main heap and
> visibility heap (aka DSM).
>
> There are several advantages to keeping a separate visibility heap:
>
> 1) it is usually higly compressible, at least you can throw away
> cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.
>
> 2) faster access, more tightly packed data pages.
>
> 3) index-only scans
>
> 4) superfast VACUUM FREEZE
>
> 5) makes VACUUM faster even for worst cases (interleaving live and dead
> tuples)
>
> 6) any index scan will be faster due to fetching only visible rows from
> main heap.


if you have to store the visibility fields of all the tuples of each table,
then you may not be able to accomodate in the cache. Say if a table is  of 1
million rows, we would need 22 MB of  visibility space(since visibility info
takes 16 bytes. I think if we have to link it with say tuple-id(6 Bytes). I
think we may need to link it with indexes with one more id. i am not
counting that now).  If we have  10 tables, then we will have 220 MB.
Keeping them pinned in memory may not be advisable in some circumstances. If
it is not going to be in memory, then that is no different from referring a
table. But i accept that is a concept worth trying out. I think the
advantage with thick indexes comes with the fact, that it is optional. If we
can make this also as optional, that would be better.
But if we are going to suggest it as a replacement of DSM, then it loses the
advantage of being small.

> >  Even when all the tuples are in memory, index only scans are
> > > almost 40-60% faster than the index scans with thin indexes.
> >
> > Have you actually benchmarked that? What kind of query was that? I don't
> > believe for a second that fetching the heap tuple when the page is in
> > memory accounts for 40-60% of the overhead of regular index scans.
>
> It depends heavily on the type of memory (postgresql page or disk cache)
> it is in.
>
> I remember doing Slony sobscribes in early days, and the speed
> difference on loading a table with active PK index was several times,
> depending on shared_buffers setting.
>
> That was for a table, where both heap and index did fit in the 2G memory
> which was available, the difference being only shuffling the pages
> between postgresql buffer and linux system cache or not.
>
> > BTW, another issue you'll have to tackle, that a DSM-based patch will
> > have to solve as well, is how to return tuples from an index. In b-tree,
> > we scan pages page at a time, keeping a list of all tids that match the
> > scanquals in BTScanOpaque. If we need to return not only the tids of the
> > matching tuples, but the tuples as well, where do we store them? You
> > could make a palloc'd copy of them all, but that seems quite expensive.
>
> Have you considered returning them as "already visibility-checked pages"
> similar to what views or set-returning functions return ?
>
> ---
> Hannu
>
>
>
>
>


-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-10-23 kell 13:04, kirjutas Heikki
Linnakangas:
> Gokulakannan Somasundaram wrote:
> > Say, with a normal index, you need to goto the table for checking the
> > snapshot. So you would be loading both the index pages + table pages, in
> > order to satisfy a certain operations. Whereas in thick index you occupy 16
> > bytes per tuple more in order to avoid going to the table. So memory
> > management is again better. But i can run the load test, if that's
> > required.
> 
> Yes, performance testing is required for any performance-related patch.
> 
> Remember that you're competing against DSM. We're going to want some
> kind of a DSM anyway because it allows skipping unmodified parts of the
> heap in vacuum.

I would suggest that you use just an additional heap with decoupled
visibility fields as DSM.

For a large number of usage scenarios this will be highly compressible
and will mostly stay in processor caches .

You can start slow, and have the info duplicated in both main heap and 
visibility heap (aka DSM).

There are several advantages to keeping a separate visibility heap:

1) it is usually higly compressible, at least you can throw away
cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.

2) faster access, more tightly packed data pages.

3) index-only scans

4) superfast VACUUM FREEZE 

5) makes VACUUM faster even for worst cases (interleaving live and dead
tuples)

6) any index scan will be faster due to fetching only visible rows from
main heap.

> >  Even when all the tuples are in memory, index only scans are
> > almost 40-60% faster than the index scans with thin indexes.
> 
> Have you actually benchmarked that? What kind of query was that? I don't
> believe for a second that fetching the heap tuple when the page is in
> memory accounts for 40-60% of the overhead of regular index scans.

It depends heavily on the type of memory (postgresql page or disk cache)
it is in.

I remember doing Slony sobscribes in early days, and the speed
difference on loading a table with active PK index was several times,
depending on shared_buffers setting. 

That was for a table, where both heap and index did fit in the 2G memory
which was available, the difference being only shuffling the pages
between postgresql buffer and linux system cache or not.

> BTW, another issue you'll have to tackle, that a DSM-based patch will
> have to solve as well, is how to return tuples from an index. In b-tree,
> we scan pages page at a time, keeping a list of all tids that match the
> scanquals in BTScanOpaque. If we need to return not only the tids of the
> matching tuples, but the tuples as well, where do we store them? You
> could make a palloc'd copy of them all, but that seems quite expensive.

Have you considered returning them as "already visibility-checked pages"
similar to what views or set-returning functions return ?

---
Hannu





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
> Gokulakannan Somasundaram wrote:
> > Say, with a normal index, you need to goto the table for checking the
> > snapshot. So you would be loading both the index pages + table pages, in
> > order to satisfy a certain operations. Whereas in thick index you occupy
> 16
> > bytes per tuple more in order to avoid going to the table. So memory
> > management is again better. But i can run the load test, if that's
> > required.
>
> Yes, performance testing is required for any performance-related patch.
>
> Remember that you're competing against DSM. We're going to want some
> kind of a DSM anyway because it allows skipping unmodified parts of the
> heap in vacuum.
>
> >  Even when all the tuples are in memory, index only scans are
> > almost 40-60% faster than the index scans with thin indexes.
>
> Have you actually benchmarked that? What kind of query was that? I don't
> believe for a second that fetching the heap tuple when the page is in
> memory accounts for 40-60% of the overhead of regular index scans.


The patch has been submitted.  Try Explain Analyze. You can see it for
yourself.  Try creating a table and normal index. try creating another table
with thick index. Check for queries which involves index-only scans. it
won't get displayed in the plan. If you create a index on (n1,n2) and insert
some 100,000 rows try querying like select n2 from table where n1 >  and n1
<.  Play around with it to see the difference.

> What do you thick about not maintaining pins in case of thick indexes?
>
> Seems irrelevant. Keeping a page pinned is cheap.


I am not referring to the process of pinning a page. It is the occupation of
8KB of memory. you don't need to occupy it in case of thick indexes, once
the page is referred.


BTW, another issue you'll have to tackle, that a DSM-based patch will
> have to solve as well, is how to return tuples from an index. In b-tree,
> we scan pages page at a time, keeping a list of all tids that match the
> scanquals in BTScanOpaque. If we need to return not only the tids of the
> matching tuples, but the tuples as well, where do we store them? You
> could make a palloc'd copy of them all, but that seems quite expensive.


I have done a palloc for MinimalIndexTuple(with just the datums). palloc is
costly, but it is not as costly as referring to a table. In other words it
is not as costly as an I/O. Memory operates in micro seconds, I/O operates
in milli seconds.  I think the performance test results would answer  these
concerns.

Are you convinced with the update performance? Definitely that's not there
with DSM:)

Thanks,
Gokul.
CertoSQL Project.
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote:
> Say, with a normal index, you need to goto the table for checking the
> snapshot. So you would be loading both the index pages + table pages, in
> order to satisfy a certain operations. Whereas in thick index you occupy 16
> bytes per tuple more in order to avoid going to the table. So memory
> management is again better. But i can run the load test, if that's
> required.

Yes, performance testing is required for any performance-related patch.

Remember that you're competing against DSM. We're going to want some
kind of a DSM anyway because it allows skipping unmodified parts of the
heap in vacuum.

>  Even when all the tuples are in memory, index only scans are
> almost 40-60% faster than the index scans with thin indexes.

Have you actually benchmarked that? What kind of query was that? I don't
believe for a second that fetching the heap tuple when the page is in
memory accounts for 40-60% of the overhead of regular index scans.

> What do you thick about not maintaining pins in case of thick indexes?

Seems irrelevant. Keeping a page pinned is cheap.

BTW, another issue you'll have to tackle, that a DSM-based patch will
have to solve as well, is how to return tuples from an index. In b-tree,
we scan pages page at a time, keeping a list of all tids that match the
scanquals in BTScanOpaque. If we need to return not only the tids of the
matching tuples, but the tuples as well, where do we store them? You
could make a palloc'd copy of them all, but that seems quite expensive.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
> Please keep the list cc'd.
>
> Gokulakannan Somasundaram wrote:
> > On 10/23/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
> >> Gokulakannan Somasundaram wrote:
> >> I have also enabled the display of Logical Reads. In order to see that,
> >> set
> >>> log_statement_stats on.
> >> You should start benchmarking, to verify that you're really getting the
> >> kind of speed up you're looking for, before you spend any more effort
> on
> >> that. Reduction in logical reads alone isn't enough. Remember that for
> a
> >> big change like that, the gain has to be big as well.
> >
> > I have done the benchmark. I have done the benchmark with Logical reads,
> as
> > they turn out to be potential physical reads. Try turning on the
> > log_statement_stats in postgresql.conf. try firing some queries, which
> can
> > satisfied by the index. You would see the difference.
>
> I would see a decrease in the number of logical reads, that's all. You
> need to demonstrate a real increase in throughput and/or reduction in
> response times.
>
> Note that even though you reduce the number of logical reads, with a
> thick index a logical read is *more* likely to be a physical read,
> because the index is larger and therefore consumes more cache.


Say, with a normal index, you need to goto the table for checking the
snapshot. So you would be loading both the index pages + table pages, in
order to satisfy a certain operations. Whereas in thick index you occupy 16
bytes per tuple more in order to avoid going to the table. So memory
management is again better. But i can run the load test, if that's
required.  Even when all the tuples are in memory, index only scans are
almost 40-60% faster than the index scans with thin indexes.

> As a first test, I'd like to see results from SELECTs on different sized
> >> tables. On tables that fit in cache, and on tables that don't. Tables
> >> large enough that the index doesn't fit in cache. And as a special
> case,
> >> on a table just the right size that a normal index fits in cache, but a
> >> thick one doesn't.
> >
> > I have not done a Load test. That's a good idea. Are you guys using
> Apache
> > JMeter?
>
> You can use whatever you want, as long as you can get the relevant
> numbers out of it. contrib/pgbench is a good place to start.
>
> DBT-2 is another test people often use for patches like this. It's quite
> tedious to set up and operate, but it'll give you nice very graphs.
>
> Make sure you control vacuums, checkpoints etc., so that you get
> repeatable results.



Sure i will do that. Thanks for the advice.

> Also i think you might have noted that the thick indexes are not affected
> by
> > updates, if the updated column is not in the index. I think that add on
> to
> > one more advantage of thick indexes against DSM.
>
> That cannot possibly work. Imagine that you have a table
>
> ctid | id | data
> -++-
> (0,1)| 1  | foo
> (0,2)| 1  | bar
>
> where (0,2) is an updated version of (0,1). If you don't update the
> index, there will be no index pointer to (0,2), so a regular index scan,
> not an index-only scan, will not find the updated tuple.
>
> Or did you mean that the index is not updated on HOT updates? That's an
> interesting observation. We could do index-only scans with the DSM as
> well, even if there's HOT updates, if we define the bit in the bitmap to
> mean "all tuples in this page are visible to everyone, or there's only
> HOT updates". That works, because an index-only-scan doesn't access any
> of the updated columns. It probably isn't worth it, though. Seems like a
> pretty narrow use case, and makes it more complicated.



I think i was not understood. An update transaction is not degraded by thick
index. Update = Delete + insert. If you don't update the columns in index,
then we would goto the same index page for both delete and insert. i have
done a small optimization there to cache the BTStack. you do not need to do
any more I/O. So effectively update performance in thick index = update
performance in thin index (if indexed columns are not updated).
Hope i am clear..

What do you thick about not maintaining pins in case of thick indexes?

Thanks,
Gokul,
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Heikki Linnakangas
Please keep the list cc'd.

Gokulakannan Somasundaram wrote:
> On 10/23/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>> Gokulakannan Somasundaram wrote:
>> I have also enabled the display of Logical Reads. In order to see that,
>> set
>>> log_statement_stats on.
>> You should start benchmarking, to verify that you're really getting the
>> kind of speed up you're looking for, before you spend any more effort on
>> that. Reduction in logical reads alone isn't enough. Remember that for a
>> big change like that, the gain has to be big as well.
> 
> I have done the benchmark. I have done the benchmark with Logical reads, as
> they turn out to be potential physical reads. Try turning on the
> log_statement_stats in postgresql.conf. try firing some queries, which can
> satisfied by the index. You would see the difference.

I would see a decrease in the number of logical reads, that's all. You
need to demonstrate a real increase in throughput and/or reduction in
response times.

Note that even though you reduce the number of logical reads, with a
thick index a logical read is *more* likely to be a physical read,
because the index is larger and therefore consumes more cache.

> As a first test, I'd like to see results from SELECTs on different sized
>> tables. On tables that fit in cache, and on tables that don't. Tables
>> large enough that the index doesn't fit in cache. And as a special case,
>> on a table just the right size that a normal index fits in cache, but a
>> thick one doesn't.
> 
> I have not done a Load test. That's a good idea. Are you guys using Apache
> JMeter?

You can use whatever you want, as long as you can get the relevant
numbers out of it. contrib/pgbench is a good place to start.

DBT-2 is another test people often use for patches like this. It's quite
tedious to set up and operate, but it'll give you nice very graphs.

Make sure you control vacuums, checkpoints etc., so that you get
repeatable results.

> Also i think you might have noted that the thick indexes are not affected by
> updates, if the updated column is not in the index. I think that add on to
> one more advantage of thick indexes against DSM.

That cannot possibly work. Imagine that you have a table

ctid | id | data
-++-
(0,1)| 1  | foo
(0,2)| 1  | bar

where (0,2) is an updated version of (0,1). If you don't update the
index, there will be no index pointer to (0,2), so a regular index scan,
not an index-only scan, will not find the updated tuple.

Or did you mean that the index is not updated on HOT updates? That's an
interesting observation. We could do index-only scans with the DSM as
well, even if there's HOT updates, if we define the bit in the bitmap to
mean "all tuples in this page are visible to everyone, or there's only
HOT updates". That works, because an index-only-scan doesn't access any
of the updated columns. It probably isn't worth it, though. Seems like a
pretty narrow use case, and makes it more complicated.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Gokulakannan Somasundaram
On 10/23/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
> Gokulakannan Somasundaram wrote:
> > I would like to present the first patch. It currently has the
> following
> > restrictions
> > a) It does not support any functional indexes.
> > b) It supports queries like select count(1) from table where
> (restrictions
> > from indexed columns), but it does not support select count(1) from
> table.
>
> An interesting question is how to represent tuples coming from the index
> in the executor. I see that you didn't address that at all, because you
> only support "COUNT(1)", and not things like "SELECT column FROM table
> WHERE id = ?" where you actually return datums from the index. But
> that's something that we have to think about in the DSM approach as well.

That's addressed as well.

One solution is to form a heap tuple, using the datums from the index,
> with the attributes that are not used in the query replaced with NULLs.
> That seems simple, but I don't think it'll work with expression indexes,
> when you do something like "SELECT length(column) FROM table WHERE id =
> ?", and there's an index on (id, length(column)).
>
> > I have also enabled the display of Logical Reads. In order to see that,
> set
> > log_statement_stats on.
>
> You should start benchmarking, to verify that you're really getting the
> kind of speed up you're looking for, before you spend any more effort on
> that. Reduction in logical reads alone isn't enough. Remember that for a
> big change like that, the gain has to be big as well.
>
> As a first test, I'd like to see results from SELECTs on different sized
> tables. On tables that fit in cache, and on tables that don't. Tables
> large enough that the index doesn't fit in cache. And as a special case,
> on a table just the right size that a normal index fits in cache, but a
> thick one doesn't.
>
> --
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com
>


Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-23 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote:
> I would like to present the first patch. It currently has the following
> restrictions
> a) It does not support any functional indexes.
> b) It supports queries like select count(1) from table where (restrictions
> from indexed columns), but it does not support select count(1) from table.

An interesting question is how to represent tuples coming from the index
in the executor. I see that you didn't address that at all, because you
only support "COUNT(1)", and not things like "SELECT column FROM table
WHERE id = ?" where you actually return datums from the index. But
that's something that we have to think about in the DSM approach as well.

One solution is to form a heap tuple, using the datums from the index,
with the attributes that are not used in the query replaced with NULLs.
That seems simple, but I don't think it'll work with expression indexes,
when you do something like "SELECT length(column) FROM table WHERE id =
?", and there's an index on (id, length(column)).

> I have also enabled the display of Logical Reads. In order to see that, set
> log_statement_stats on.

You should start benchmarking, to verify that you're really getting the
kind of speed up you're looking for, before you spend any more effort on
that. Reduction in logical reads alone isn't enough. Remember that for a
big change like that, the gain has to be big as well.

As a first test, I'd like to see results from SELECTs on different sized
tables. On tables that fit in cache, and on tables that don't. Tables
large enough that the index doesn't fit in cache. And as a special case,
on a table just the right size that a normal index fits in cache, but a
thick one doesn't.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster