Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

Gokulakannan Somasundaram Fri, 26 Oct 2007 05:48:55 -0700

On 10/26/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
> Gokulakannan Somasundaram wrote:
> > On 10/26/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
> >> Gokulakannan Somasundaram wrote:
> >>> As far as Load Test is concerned, i have tried to provide all the
> >> relevant
> >>> details. Please inform me, if i have left any.
> >> Thanks!
> >>
> >> How large were the tables?
> >
> > It is in the Performance test report.  They contain 2 million
> records.  6
> > columns wide, 3 text and 3 numeric. same set of tables used for both
> tests,
> > after refresh from a file
>
> I meant in megabytes. How wide is the data in the text and numeric fields?




I have observed the size of PGDATA\base folder for size details
Size of Tables : 367 MB
Size of Tables + thin indexes :  616 MB
Size of Tables + thick indexes : 720 MB

The numbers were simply  running between 1 and 2 million in a serial
fashion. I think i made a mistake here. this would have helped thin indexes
in range scans, since the data is clustered at the table, the bitmap heap
scan would have been more effective. So i hope thick indexes will be more
effective, if  uncluster the data, since the thin index has to goto more
table buffers. The test columns are approx 10 characters in length.



> Did you run all the queries concurrently? At this point, I think it'd be
> >> better to run them separately so that you can look at the impact on
> each
> >> kind of operation in isolation.
> >>
> > Performance tests are run against a workload and i have taken the
> workload
> > of a small scale partitioning setup. Running the queries individually
> has
> > already been done and the count of logical reads have been verified. I
> have
> > already suggested that. For some reason, i am not able to convince that
> for
> > simple index scans, Logical reads are a good measure of performance.
>
> I wouldn't expect any performance gain for simple, not index-only,
> scans. They have to hit the heap anyway.


I just feel the above test didn't do much I/Os and yet the index only scans
are faster with thick indexes. since the size of RAM is 1GB and the size of
the data is only 616MB, i hope most of them might have been OS cached. May
be i am missing something here.


> What does the numbers look like if the the tables are small enough to
> >> fit in RAM?
> >
> > I don't know whether this is a valid production setup, against which we
> need
> > to benchmark. But if you insist, i will do that and get back to you next
> > time.
>
> A lot of people run databases that fit in RAM. And a lot of people
> don't. Both cases are interesting. I'm particularly curious about that
> because you've argued that the number of logical reads is important,
> even if they don't become physical reads. Hannu also suggested that
> swapping pages in/out of shared_buffers is relatively expensive; if
> that's the case, we should see index-only scans performing much better
> regular index scans, even when there's no physical I/O.


So the above test has fit into the RAM. Now do we need a test with tables
that won't fit into RAM. i feel if the thick indexes were effective with
data that would fit into RAM, then it will definitely be more effective with
data that wouldn't fit into RAM. There is one performance bug, with updates
where the caching strategy for BTStack didn't go effective for the Varlena
structures. i will fix that bug next time. Also calls to HOT related stuff
can be avoided, if it happens to be a thick index, I think these two
changes, if made would further improve the performance of thick indexes.


> You should do some tuning, the PostgreSQL default configuration is not
> >> tuned for maximum performance. At least increase checkpoint_segments
> and
> >> checkpoint_timeout and shared_buffers. Though I noticed that you're
> >> running on Windows; I don't think anyone's done any serious performance
> >> testing or tuning on Windows yet, so I'm not sure how you should tune
> >> that.
> >
> > What we are trying to do here, is to try and compare the performance of
> two
> > indexing structures. AFAIK, the performance test done to compare two
> > software implementations should not have parameter settings, favorable
> to
> > one. I have not done any settings change favorable to thick index.
>
> The tuning I suggested is just basic tuning any knowledgeable Postgres
> DBA will do. It's not particularly in favor of any indexing scheme. With
> the default checkpoint settings, for example, the system is going to be
> busy doing checkpoints all the time if you have a reasonable rate of
> updates.


The inserts and updates were at the rate of 10 every 2 seconds (there in the
performance report) and the update was affecting two rows. I i haven't got
any warning to increase the checkpoint during the test.
But my doubt is  if checkpoint has caused so much of overhead, as we think
of, how can the performance of thick indexes exceed thin indexes in index
only scans?
As you might have observed all the statistics (Even the 90 and 95th
percentile/median) were in milliseconds. So that might give a hint about the
stress on the system.





-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

Reply via email to