from:"Matthew"

Re: [HACKERS] Make subquery alias optional in FROM clause

2017-02-26 Thread Matthew Woodcraft

On 2017-02-24 07:25, Robert Haas wrote:
> I don't think it's only Oracle that allows omitting the
> alias; I think there are a number of other systems that behave
> similarly.

SQLite, for example.

Making conversions from SQLite to Postgres easier is a Good Thing.
"subquery in FROM must have an alias" has caused me inconvenience doing
that as recently as last week.

-M-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Remove TZ entry from postgres CLI doc page.

2016-03-29 Thread Matthew Somerville

Hi,

Please find attached a patch to the postgres command line
documentation page doc/src/sgml/ref/postgres-ref.sgml that removes the
"TZ" entry from the "Environment" section. If I've understood it
correctly, since ca4af308 TZ can be looked at when you run initdb, but
is not looked at when the server is started.

I am using Test::PostgreSQL to create a test database; it empties the
postgresql.conf created by initdb and is then therefore not using the
correct timezone, and it took me a while to work out what was
happening and why I couldn't use TZ when starting the database.

ATB,
Matthew


0001-Remove-TZ-entry-from-postgres-CLI-doc-page.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables

2015-02-05 Thread Matthew Kelly

 That's assuming that toasting is evenly spread between tables. In my 
 experience, that's not a great bet...

Time to create a test:
SELECT chunk_id::bigint/10 as id_range, count(*), count(*)/(10::float) 
density FROM (SELECT chunk_id FROM pg_toast.pg_toast_39000165 WHERE chunk_id 
 1 AND chunk_seq = 0) f GROUP BY id_range ORDER BY id_range;

The machine in question was restored in parallel in Sept 2013 as part of an 
upgrade from 8.4.  It has about 2000 tables, so its definitely not dominated by 
a couple tables. Progress towards oid wrap around is about 25.6%.

With minimal effort, I found 2 bad examples, and I’m sure I can easily find 
more. I attached the results for those two.

There were runs of 1,100,000+ and 600,000+ chunk_ids where more than 99% of the 
chunk_id are taken.  After restore completion, oid densities averaged less than 
20 per 100,000 and 400 per 100,000 respectively.  The only reasons those runs 
seem to be so short is because the tables were much smaller back then.  I 
expect that next time I dump restore (necessary for upgrading OS versions due 
to the collation issue), I’m going to have runs closer to 20,,000.

 ... this fix would actually make things enormously worse.  With the
 single counter feeding all tables, you at least have a reasonable
 probability that there are not enormously long runs of consecutive OIDs in
 any one toast table.  With a sequence per table, you are nearly guaranteed
 that there are such runs, because inserts into other tables don't create a
 break.

It makes each toast table independent (and far less likely to wrap) .  It would 
wrap when the sum(mods on THIS toast table)  2^32.  Right now the function 
looks like:

sum(mods on ALL toast tables in cluster) + sum(created normal tables in cluster 
* k) + sum(created temp tables in cluster * k) + [...]  2^32,
WHERE k average number of ids consumed for pg_class, pg_type, etc...

In the case of an insert only table (which is a common use case for 
partitions), the id would only wrap when the TOAST table was “full”.  On the 
other hand currently, it would wrap into its pg_restored section when the 
combined oid consuming operations on the cluster surpassed 4 billion.

That being said, I’m certainly not attached to that solution.  My real argument 
is that although its not a problem today, we are only about 5 years from it 
being a problem for large installs and the first time you’ll hear about it is 
after someone has a 5 minute production outage on a database thats been taking 
traffic for 2 years.

- Matt K.


id_range | count | density 
-+---+-
 390 | 92188 | 0.92188
 391 | 99186 | 0.99186
 392 | 99826 | 0.99826
 393 | 99101 | 0.99101
 394 | 99536 | 0.99536
 395 | 99796 | 0.99796
 396 | 99321 | 0.99321
 397 | 99768 | 0.99768
 398 | 99744 | 0.99744
 399 | 99676 | 0.99676
 400 | 98663 | 0.98663
 401 | 40690 |  0.4069
 403 |92 | 0.00092
 404 |   491 | 0.00491
 407 |74 | 0.00074
 408 |54 | 0.00054
 415 |   152 | 0.00152
 416 |47 | 0.00047
 419 |59 | 0.00059
 422 | 2 |   2e-05
 423 |14 | 0.00014
 424 | 5 |   5e-05
 425 |11 | 0.00011
 426 | 7 |   7e-05
 427 | 5 |   5e-05
 428 | 6 |   6e-05
 517 | 5 |   5e-05
 518 | 9 |   9e-05
 519 | 6 |   6e-05
 520 |12 | 0.00012
 521 |17 | 0.00017
 522 | 5 |   5e-05
 588 |15 | 0.00015
 589 |10 |  0.0001
 590 |19 | 0.00019
 591 |12 | 0.00012
 592 |12 | 0.00012
 593 | 2 |   2e-05
 617 | 4 |   4e-05
 618 | 9 |   9e-05
 619 | 7 |   7e-05
 620 |14 | 0.00014
 621 | 5 |   5e-05
 622 |11 | 0.00011
 682 | 8 |   8e-05
 683 |13 | 0.00013
 684 |17 | 0.00017
 685 | 6 |   6e-05
 686 |17 | 0.00017
 687 | 4 |   4e-05
 767 | 5 |   5e-05
 768 |10 |  0.0001
 769 | 9 |   9e-05
 770 | 2 |   2e-05
 771 |14 | 0.00014
 772 | 2 |   2e-05
 773 |11 | 0.00011
 774 |13 | 0.00013
 775 |10 |  0.0001
 776 | 3 |   3e-05
 914 | 7 |   7e-05
 915 | 7 |   7e-05
 916 | 1 |   1e-05
 917 | 3 |   3e-05
 918 | 3 |   3e-05
 919 | 5 |   5e-05
 920 | 4 |   4e-05
 921 | 9 |   9e-05
 922 | 9 |   9e-05
 923 | 1 |   1e-05
(70 rows)

id_range | count | density 
-+---+-
 402 | 96439 | 0.96439
 403 | 99102 | 0.99102
 404 | 98787 | 0.98787
 405 | 99351

Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables

2015-02-03 Thread Matthew Kelly

 Hmm 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap 
 page) is 8796093022208 (~9e13) bytes
 ... which results in 8192 1GB segments :O

8192 1GB segments is just 8TB, its not _that_ large.  At TripAdvisor we’ve been 
using a NoSQL solution to do session storage.  We are looking to probably swap 
that out to be Postgres (every other db backing the site is Postgres).  
Essentially, what I’m building is a system with 1 logical table that maps 
session id to a 2KB+ grab bag of ever changing session attributes which is 
partially normalized, partially json.  315 million uniques a month multiplied 
by the retention policy means I need to hold 2-4 billion session objects (and 
somehow expire old ones).  Additionally, most http calls can update the 
session, so between maintenance windows I expect to take around 20 billion 
'upserts’.  Obviously, I will have to shard and partition the table in 
practice, but this weekend I ran a test that demonstrated that a single table 
on a 9.4 server + logical replication + Dell 730xd can handle 4x that workload. 
 Well, it can for 38 hours… until you wrap xid’s on the toast table.  :P  I’ll 
be the first to admit that isn’t the normal use case though.  I’m happy to have 
found this thread, however, because I’m going to have to build around the 
global oid counter, explicitly the prevent the problem I explain below 
regarding clustering.

 Anybody actually reaching this limit out there?

Well its not the 4 billion row limit that concerns me, its the global shared 
counter in conjunction with pg_restore/clustering that is actually pretty 
concerning.

Just checked through all of TripAdvisor’s normal databases and the max tuples I 
see in single toast table is 17,000,000, so that is still a couple of orders of 
magnitude too small.  (however, close enough that it’ll be a concern in a few 
years).

However, I do have active databases where the current oid is between 1 billion 
and 2 billion.  They were last dump-restored for a hardware upgrade a couple 
years ago and were a bit more than half the size.  I therefore can imagine that 
I have tables which are keyed by ~8,000,000 consecutive oids.

I would argue that when it wraps there will be a single insert that will 
probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index 
scans inside of GetNewOidWithIndex.  Even partitioning doesn’t protect you from 
this potential problem.

What even more weird is that this issue can be trigged by consuming too many 
oid’s in a different database in the same cluster (i.e. creating large amounts 
of temp tables)

 The problem with changing the id from 32 to 64 bits is that the storage *for 
 everybody else* doubles, making the implementation slower for most though 
 this might be actually not that important.

Well, you aren’t doubling the storage.  Even if you have to store the key in 4 
places, you are adding 16 bytes per TOAST tuple.  If we work off the 2KB 
estimate for each TOAST tuple, then you are only increasing the storage by 
0.7%.  I’m sure there are more hidden costs but we are really only talking 
about a low single digit percent increase.  In exchange, you get to drop one 
index scan per toast insert; an index scan looking in the only hot part of the 
index. 

That being said I’d be perfectly happy merely giving each TOAST table its own 
sequence as that almost entire mitigates the risk of an unexpected lock up on 
reasonably sized tables/partitions, and provides a functional work around for 
those of us with larger than average installs.

- Matt K
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-12 Thread Matthew Woodcraft

On 2014-10-10 19:44, Kevin Grittner wrote:
 Peter Geoghegan p...@heroku.com wrote:
 People keep remarking that they don't like that you can (optionally)
 name a unique index explicitly, 

[...]

 To restate: to do so is conflating the logical definition of the 
 database with a particular implementation detail.  As just one 
 reason that is a bad idea: we can look up unique indexes on the 
 specified columns, but if we implement a other storage techniques 
 where there is no such thing as a unique index on the columns, yet 
 manage to duplicate the semantics (yes, stranger things have 
 happened), people can't migrate to the new structure without 
 rewriting their queries

Wouldn't it be good enough to define the 'WITHIN' as expecting a
unique-constraint name rather than an index name (even though those
happen to be the same strings)?

I think constraints are part of the logical definition of the database,
and a new storage technique which doesn't use indexes should still have
names for its unique constraints.

-M-




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Matthew Kelly

Here is where I think the timezone and PostGIS cases are fundamentally 
different:
I can pretty easily make sure that all my servers run in the same timezone.  
That's just good practice.  I'm also going to install the same version of 
PostGIS everywhere in a cluster.  I'll build PostGIS and its dependencies from 
the exact same source files, regardless of when I build the machine.

Timezone is a user level setting; PostGIS is a user level library used by a 
subset.

glibc is a system level library, and text is a core data type, however.  
Changing versions to something that doesn't match the kernel can lead to system 
level instability, broken linkers, etc.  (I know because I tried).  Here are 
some subtle other problems that fall out:

 * Upgrading glibc, the kernel, and linker through the package manager in order 
to get security updates can cause the corruption.
 * A basebackup that is taken in production and placed on a backup server might 
not be valid on that server, or your desktop machine, or on the spare you keep 
to do PITR when someone screws up.
 * Unless you keep _all_ of your clusters on the same OS, machines from your 
database spare pool probably won't be the right OS when you add them to the 
cluster because a member failed.

Keep in mind here, by OS I mean CentOS versions.  (we're running a mix of late 
5.x and 6.x, because of our numerous issues with the 6.x kernel)

The problem with LC_IDENTIFICATION is that every machine I have seen reports 
revision 1.0, date 2000-06-24.  It doesn't seem like the versioning is 
being actively maintained.

I'm with Martjin here, lets go ICU, if only because it moves sorting to a user 
level library, instead of a system level.  Martjin do you have a link to the 
out of tree patch?  If not I'll find it.  I'd like to apply it to a branch and 
start playing with it.

- Matt K


On Sep 17, 2014, at 7:39 AM, Martijn van Oosterhout klep...@svana.org
 wrote:

 On Tue, Sep 16, 2014 at 02:57:00PM -0700, Peter Geoghegan wrote:
 On Tue, Sep 16, 2014 at 2:07 PM, Peter Eisentraut pete...@gmx.net wrote:
 Clearly, this is worth documenting, but I don't think we can completely
 prevent the problem.  There has been talk of a built-in index integrity
 checking tool.  That would be quite useful.
 
 We could at least use the GNU facility for versioning collations where
 available, LC_IDENTIFICATION [1]. By not versioning collations, we are
 going against the express advice of the Unicode consortium (they also
 advise to do a strcmp() tie-breaker, something that I think we
 independently discovered in 2005, because of a bug report - this is
 what I like to call the Hungarian issue. They know what our
 constraints are.). I recognize it's a tricky problem, because of our
 historic dependence on OS collations, but I think we should definitely
 do something. That said, I'm not volunteering for the task, because I
 don't have time. While I'm not sure of what the long term solution
 should be, it *is not* okay that we don't version collations. I think
 that even the best possible B-Tree check tool is a not a solution.
 
 Personally I think we should just support ICU as an option. FreeBSD has
 been maintaining an out of tree patch for 10 years now so we know it
 works.
 
 The FreeBSD patch is not optimal though, these days ICU supports UTF-8
 directly so many of the push-ups FreeBSD does are no longer necessary.
 It is often faster than glibc and the key sizes for strxfrm are more
 compact [1] which is relevent for the recent optimisation patch.
 
 Lets solve this problem for once and for all.
 
 [1] http://site.icu-project.org/charts/collation-icu4c48-glibc
 
 -- 
 Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Matthew Kelly

Let me double check that assertion before we go too far with it.

Most of the problems I've seen are across 5 and 6 boundaries.  I thought I had 
case where it was within a minor release but I can't find it right now.  I'm 
going to dig.

That being said the sort order changes whether you statically or dynamically 
link (demonstrated on 4+ machines running different linux flavors), so at the 
point I have no reason to trust the stability of the sort across any build.  I 
legitimately question whether strcoll is buggy.  Ex. I have cases where for 
three strings a, b and c:  a  b, but  (a || c)  (b || c).  That's right 
postfixing doesn't hold.  It actually calls into question the index scan 
optimization that occurs when you do LIKE 'test%' even on a single machine, but 
I don't want to bite that off at the moment.

My mentality has switched to 'don't trust any change until shown otherwise', so 
that may have bled into my last email.

- Matt K.




On Sep 17, 2014, at 8:17 AM, Robert Haas robertmh...@gmail.com
 wrote:

 On Wed, Sep 17, 2014 at 9:07 AM, Matthew Kelly mke...@tripadvisor.com wrote:
 Here is where I think the timezone and PostGIS cases are fundamentally 
 different:
 I can pretty easily make sure that all my servers run in the same timezone.  
 That's just good practice.  I'm also going to install the same version of 
 PostGIS everywhere in a cluster.  I'll build PostGIS and its dependencies 
 from the exact same source files, regardless of when I build the machine.
 
 Timezone is a user level setting; PostGIS is a user level library used by a 
 subset.
 
 glibc is a system level library, and text is a core data type, however.  
 Changing versions to something that doesn't match the kernel can lead to 
 system level instability, broken linkers, etc.  (I know because I tried).  
 Here are some subtle other problems that fall out:
 
 * Upgrading glibc, the kernel, and linker through the package manager in 
 order to get security updates can cause the corruption.
 * A basebackup that is taken in production and placed on a backup server 
 might not be valid on that server, or your desktop machine, or on the spare 
 you keep to do PITR when someone screws up.
 * Unless you keep _all_ of your clusters on the same OS, machines from your 
 database spare pool probably won't be the right OS when you add them to the 
 cluster because a member failed.
 
 Keep in mind here, by OS I mean CentOS versions.  (we're running a mix of 
 late 5.x and 6.x, because of our numerous issues with the 6.x kernel)
 
 The problem with LC_IDENTIFICATION is that every machine I have seen reports 
 revision 1.0, date 2000-06-24.  It doesn't seem like the versioning is 
 being actively maintained.
 
 I'm with Martjin here, lets go ICU, if only because it moves sorting to a 
 user level library, instead of a system level.  Martjin do you have a link 
 to the out of tree patch?  If not I'll find it.  I'd like to apply it to a 
 branch and start playing with it.
 
 What I find astonishing is that whoever maintains glibc (or the Red
 Hat packaging for it) thinks it's OK to change the collation order in
 a minor release.  I'd understand changing it between, say, RHEL 6 and
 RHEL 7.  But the idea that minor release, supposedly safe updates
 think they can whack this around without breaking applications really
 kind of blows my mind.
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Collations and Replication; Next Steps

2014-09-16 Thread Matthew Kelly

Hello,

Last month, I brought up the following issue to the general mailing list about
how running streaming replication between machines running different versions
of glibc can cause corrupt indexes.
http://www.postgresql.org/message-id/ba6132ed-1f6b-4a0b-ac22-81278f5ab...@tripadvisor.com

In the month following, we have done further investigation here at TripAdvisor
and have found that scope of this issue is far more troubling than initially
thought. Hackers seems like appropriate place to present this update because
it will certainly motivate some discussion about the approach to collation
support going forward.

After the initial episode, we thought it was necessary to find the true scope
of the problem. We developed a quick smoke test to evaluate the integrity of
the indexes on a given machine. We understood that the test was not
exhaustive, but it would catch most instances of corrupt indexes given
TripAdvisor's normal database usage pattern. The source code with
documentation about how it works is available at
(https://github.com/mkellycs/postgres_index_integrity_check) for those
interested.

What we found with this simple check was simply frightening. In every single
streaming replica cluster where one or more machines had been commissioned at a
different time, that member was found to be corrupt. When hardware upgrades of
the master had been accomplished with a streaming replication, the new master
was also found to have similar issues. The following numbers are only as small
as they are because our adoption of streaming replication has barely just
begun. So far we have found:

* 8 internal production databases, and 2 live site database servers
effected.
* Up to 3771 rows out of place in a single index (more correctly: 3771
times a row was smaller then the row before it when sorted in ascending order,
the actual number of incorrectly placed rows is probably much higher)
* On the worst offender, there were 12806 rows out of place across 26
indexes
* On average roughly 15% of indexes containing text keys on tables larger
100MB were found to exhibit this issue
* In at least one case, rebuilding a unique index on a master revealed that
the database had allowed 100+ primary key violations.

It sounds like we as a community knew that these issues were theoretically
possible, but I now have empirical evidence demonstrating the prevalence of
this issue on our corpus of international data. Instances of this issue showed
up in indexes of member usernames, location/property names, and even Facebook
url's. I encourage other sufficiently large operations who index
internationalized text to run similar tests; its highly likely they have
similar latent issues that they just have not detected yet.

Here is the simple reality. Collation based indexes, streaming replication,
and multiple versions of glibc/os cannot coexist in a sufficiently large
operation and not cause corrupt indexes. The current options are to collate
all of your indexes in C, or to ensure that all of your machines run exactly
the same OS version.

The first and immediate TODO is to patch the documentation to add warnings
regarding this issue. I can propose a doc patch explaining the issue, if no
one has any objections.

The second and far more challenging problem is how do we fix this issue? As of
our last discussion, Peter Geoghegan revived the proposal of using ICU as an
alternative.
(http://www.postgresql.org/message-id/CAEYLb_WvdCzuL=cyf1xyzjwn-1cvo6kzeawmkbxts3jphtj...@mail.gmail.com)
I do not feel qualified to compare the value of this library to other
options, but I am certainly willing to help with the patch process once a
direction has been selected.

I will be at Postgres Open in Chicago this week, and I will be more than
willing to further discuss the details of what we have found.

Regards,
Matt Kelly

Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-12-09 Thread Matthew Gerber

All,

I have successfully isolated this error and created a simple SQL script to
reproduce it. Just to recap - this script will cause a server crash with
exception 0xC409 as described in previous emails. The crux of the
problem seems to be my creation / use of the function st_transform_null. My
intent with this function is to wrap the st_transform function provided by
PostGIS, but account for the situation where the argument to be transformed
is NULL. In this situation, st_transform throws an internal_error, which my
function catches and returns NULL for. The error / crash is not caused by a
NULL argument; rather, it is caused by the final value in the attached
script's INSERT statement, which contains a lat/lon pair that is beyond
PostGIS's range. I'm not questioning whether this value is actually outside
the legal range, but I do not think such an input should cause the server
to crash completely.

Here are the steps to reproduce the crash:

1) Create a new instance of a 9.2 server (Windows 64-bit), and a new
database (call it test) with the PostGIS extension.

2) Run the script:

psql -U postgres -d test -f C:\server_crash.sql

You should see the following:

psql:C:/server_crash.sql:31: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:C:/server_crash.sql:31: connection to server was lost

3) Check your log for the error.

I hope this helps. It took me quite a while to track down the problem so I
hope someone can figure out what is going on under the hood. It seems to be
a pretty significant problem.

Cheers,
Matt

On Sun, Nov 11, 2012 at 9:45 PM, Matthew Gerber gerber.matt...@gmail.comwrote:



 On Sun, Nov 11, 2012 at 8:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Noah Misch n...@leadboat.com writes:
  So, I can reproduce the lower threshold, but the exception type does
 not agree
  with the one Matthew observed.

 I finally got around to looking at the link you provided about error
 0xC409, and realized that I'd been completely confusing it with
 stack overflow --- but actually, it's a report that something scribbled
 past the end of a finite-size local-variable array.  So I now think that
 Matthew's stumbled across two completely independent bugs, and we've
 fixed only one of them.  The 0xC409 error is something else, and
 possibly a lot worse since it could conceivably be a security issue.

 It still seems likely that the actual location of the bug is either
 in PostGIS or in the GIST index code, but without the ability to
 reproduce the failure it's awfully hard to find it.  Matthew, could
 you try a bit harder to find a self-contained test case that produces
 that error?

 regards, tom lane


 Sure, it might take me a while to find time but I'll keep it on my list.

 Matt




server_crash.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-11-11 Thread Matthew Gerber

On Sun, Nov 11, 2012 at 12:23 AM, Noah Misch n...@leadboat.com wrote:

 On Sun, Nov 04, 2012 at 02:30:38PM -0500, Tom Lane wrote:
  Matthew Gerber gerber.matt...@gmail.com writes:
   Here is the command that was executing when the 0xC409 exception
 was
   raised:
   INSERT INTO places (bounding_box,country,full_name,id,name,type,url)
   VALUES
   (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
   32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
   32.771786,-97.034085 32.771786))',4326),26918),'United
 States','Irving,
   TX','dce44ec49eb788f5','Irving','city','
   http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),
 
  Assuming that 0xC409 does actually represent a stack-overrun
  condition,

 It signifies scribbling past the end of the frame's local variables:
 http://msdn.microsoft.com/en-us/library/8dbf701c.aspx


A slight update on this:  previously, my insert code involved a long
SELECT ... UNION ALL ... SELECT ... UNION ALL ... command. If this
command was too long, I would get a stack depth exception thrown back to my
client application. I reduced the length of the command to eliminate the
client-side exceptions, but on some occasions I would still get the
0xC409 crash on the server side. I have eliminated the long SELECT ...
UNION ALL ...  command, and I now get no errors on either side. So it
seems like long commands like this were actually causing the server-side
crashes. The proper behavior would seem to be throwing the exception back
to the client application instead of crashing the server.

Hope this helps...

Matt

Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-11-11 Thread Matthew Gerber

On Sun, Nov 11, 2012 at 11:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Matthew Gerber gerber.matt...@gmail.com writes:
  On Sun, Nov 11, 2012 at 12:23 AM, Noah Misch n...@leadboat.com wrote:
  It signifies scribbling past the end of the frame's local variables:
  http://msdn.microsoft.com/en-us/library/8dbf701c.aspx

  A slight update on this:  previously, my insert code involved a long
  SELECT ... UNION ALL ... SELECT ... UNION ALL ... command.

 How long is long?


I was seeing queries with around 5000-7000  UNION ALL statements.

Matt

Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-11-11 Thread Matthew Gerber

On Sun, Nov 11, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Matthew Gerber gerber.matt...@gmail.com writes:
  On Sun, Nov 11, 2012 at 11:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  How long is long?

  I was seeing queries with around 5000-7000  UNION ALL statements.

 Hm.  I experimented with test queries created like so:

 perl -e 'print SELECT 1 a, 2 b, 3 c\n; print UNION ALL SELECT 1 a, 2 b,
 3 c\n foreach (1..8200);' | psql

 On the machine I tried this on, it works up to about 8200 and then fails
 in the way I'd expect:

 ERROR:  stack depth limit exceeded
 HINT:  Increase the configuration parameter max_stack_depth (currently
 2048kB), after ensuring the platform's stack depth limit is adequate.

 But then when I cranked it up to 8, kaboom:

 connection to server was lost

 Inspection of the core dump shows transformSetOperationTree is the
 problem --- it's recursing but lacks a check_stack_depth test.
 So that's easy to fix, but I wonder why the critical depth limit seems
 to be so much less on your machine.  I get the expected error up to
 about 65000 UNION ALLs --- why is yours crashing at a tenth of that?


Tom,

Interesting. I really have no idea why mine seemed to fail so much sooner.
I recalled my 5k-7k figure from memory, so I might be off on that, but
probably not by an order of magnitude. In any case, it sounds like you know
how to fix the problem. Should I file this as a bug report or will you take
care of it from here?

Best,
Matt

Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-11-11 Thread Matthew Gerber

On Sun, Nov 11, 2012 at 2:43 PM, Noah Misch n...@leadboat.com wrote:

 On Sun, Nov 11, 2012 at 10:10:31AM -0500, Matthew Gerber wrote:
Matthew Gerber gerber.matt...@gmail.com writes:
 Here is the command that was executing when the 0xC409
 exception
   was
 raised:
 INSERT INTO places
 (bounding_box,country,full_name,id,name,type,url)
 VALUES
 (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
 32.771786,-97.034085 32.771786))',4326),26918),'United
   States','Irving,
 TX','dce44ec49eb788f5','Irving','city','
 http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),

  A slight update on this:  previously, my insert code involved a long
  SELECT ... UNION ALL ... SELECT ... UNION ALL ... command. If this
  command was too long, I would get a stack depth exception thrown back to
 my
  client application. I reduced the length of the command to eliminate the
  client-side exceptions, but on some occasions I would still get the
  0xC409 crash on the server side. I have eliminated the long SELECT
 ...
  UNION ALL ...  command, and I now get no errors on either side. So it
  seems like long commands like this were actually causing the server-side
  crashes. The proper behavior would seem to be throwing the exception back
  to the client application instead of crashing the server.

 Above, you quoted an INSERT ... VALUES of two rows.  Have you observed an
 exception-0xC409 crash with an INSERT ... VALUES query, or only with an
 INSERT ... SELECT ... thousands of UNION query?


Every time the server crashed with 0xC409, the log reported that it was
running the simple INSERT command (two rows) that I started this thread
with. However, this didn't make any sense to me given the simplicity of the
INSERT command and the fact that the error indicated a stack overflow. So I
removed the long SELECT ... UNION ALL ... command since it seemed more
relevant to the error, and the process has been running continuously for a
few days now.

To answer your question directly:  I was seeing the server crash when using
the simple INSERT and long SELECT ... UNION ... (these commands are
issued independently at different points in the program). Now my program is
only using the simple INSERT, and the crashes are gone.

Hope this helps...

Matt

Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-11-11 Thread Matthew Gerber

On Sun, Nov 11, 2012 at 8:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Noah Misch n...@leadboat.com writes:
  So, I can reproduce the lower threshold, but the exception type does not
 agree
  with the one Matthew observed.

 I finally got around to looking at the link you provided about error
 0xC409, and realized that I'd been completely confusing it with
 stack overflow --- but actually, it's a report that something scribbled
 past the end of a finite-size local-variable array.  So I now think that
 Matthew's stumbled across two completely independent bugs, and we've
 fixed only one of them.  The 0xC409 error is something else, and
 possibly a lot worse since it could conceivably be a security issue.

 It still seems likely that the actual location of the bug is either
 in PostGIS or in the GIST index code, but without the ability to
 reproduce the failure it's awfully hard to find it.  Matthew, could
 you try a bit harder to find a self-contained test case that produces
 that error?

 regards, tom lane


Sure, it might take me a while to find time but I'll keep it on my list.

Matt

Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-11-05 Thread Matthew Gerber

On Sun, Nov 4, 2012 at 3:39 AM, Craig Ringer ring...@ringerc.id.au wrote:

  On 11/04/2012 08:47 AM, Matthew Gerber wrote:


 So I attached the VS debugger, but the server died without raising an
 exception in VS. Not sure what's going on here.


   Try creating a directory called crashdumps in the data directory, at
 the same level as pg_xlog and pg_clog etc. Give the postgresql user
 the full control permission on it. Then run the test again.

 Do any minidump files appear in the directory? If so, you can examine them
 with windbg or Visual Studio to see where the crash happened.


I did this but nothing appears in crashdumps after the server crashes. The
latest test I did included the addition of this directory and the disabling
of my antivirus software. Nothing seems to have changed. Following Tom's
suggestion, I'll try to get a stack trace again (last time didn't produce
anything).

The only other thing I've noticed is that the crash always occurs when
inserting into the places table (definition in previous email), even
though there are two other tables that are also receiving inserts. This is
odd to me. Any thoughts?

Matt

Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-11-04 Thread Matthew Gerber

On Sun, Nov 4, 2012 at 3:39 AM, Craig Ringer ring...@ringerc.id.au wrote:

  On 11/04/2012 08:47 AM, Matthew Gerber wrote:


  Here is the command that was executing when the 0xC409 exception was
 raised:

 INSERT INTO places (bounding_box,country,full_name,id,name,type,url)

 VALUES

 (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
 32.771786,-97.034085 32.771786))',4326),26918),'United States','Irving,
 TX','dce44ec49eb788f5','Irving','city','
 http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),


 OK, so you're using PostGIS. What other extensions are loaded? PL/R? Any
 other PLs?


PostGIS is the only extension that I added to the default configuration. I
didn't change anything else.



 Can you show the definition of the table `places`, incuding any associated
 triggers, etc? Use `\d+` in psql for the table def'n.


Here the definition of the places table:

twitter=# \d+ places
   Table public.places
Column|  Type   | Modifiers | Storage  | Stats
target |Description
--+-+---+--+--+-
 bounding_box | geometry(Polygon,26918) |   | main
|  |
 country  | character varying   |   | extended
|  |
 full_name| character varying   |   | extended
|  |
 id   | character varying   | not null  | extended
|  |
 name | character varying   |   | extended
|  |
 type | character varying   |   | extended
|  |
 url  | character varying   |   | extended
|  |

Indexes:
places_pkey PRIMARY KEY, btree (id)
places_bounding_box_idx gist (bounding_box)
places_type_idx btree (type)
Referenced by:
TABLE tweets CONSTRAINT tweets_place_id_fkey FOREIGN KEY (place_id)
REFERENCES places(id) ON DELETE CASCADE
Has OIDs: no



 Please also post the query plan. http://explain.depesz.com/ is useful for
 this.


Here is the query plan:

QUERY PLAN

Insert on public.places  (cost=0.00..0.01 rows=1 width=0) (actual
time=1.000..1.000 rows=0 loops=1)
   -  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001
rows=1 loops=1)
 Output:
'0103202669010005005E8705A4E32C38C1EE553AE6E95E
4D41086A91990B1B38C11620AF9784874D41FCA5741676E437C19436654287814D41C43E11332BF6
37C17C863746F0584D415E8705A4E32C38C1EE553AE6E95E4D41'::geometry(Polygon,26918),
'United States'::character varying, 'Irving, TX'::character varying,
'dce44ec49e
b788f5'::character varying, 'Irving'::character varying, 'city'::character
varyi
ng, 'http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'::charactervarying
 Total runtime: 1.157 ms
(4 rows)




 So I attached the VS debugger, but the server died without raising an
 exception in VS. Not sure what's going on here.


   Try creating a directory called crashdumps in the data directory, at
 the same level as pg_xlog and pg_clog etc. Give the postgresql user
 the full control permission on it. Then run the test again.


Running it now.



 Do any minidump files appear in the directory? If so, you can examine them
 with windbg or Visual Studio to see where the crash happened.


Will try it.

Thanks for your help so far, guys. Hopefully we get somewhere on this...

Matt

Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-11-03 Thread Matthew Gerber

On Fri, Nov 2, 2012 at 9:00 PM, Noah Misch n...@leadboat.com wrote:

 On Fri, Nov 02, 2012 at 02:05:47PM -0500, Merlin Moncure wrote:
  On Fri, Nov 2, 2012 at 1:25 PM, Matthew Gerber gerber.matt...@gmail.com
 wrote:
   I am encountering an error on my Postgres installation for Windows
 Server
   64-bit. The error was posted here a couple months ago; however, no
 solution
   was found on the pgsql-bugs list, so I am reposting it to
 pgsql-hackers in
   the hopes that someone will be able to help. My error message is
 identical
   to the one previously posted:
  
   2012-11-01 22:36:26 EDT LOG:  0: server process (PID 7060) was
   terminated by exception 0xC409
   2012-11-01 22:36:26 EDT DETAIL:  Failed process was running: INSERT
 INTO
   [snipped SQL command]

 Could you post an anonymized query, post an anonymized query plan, and/or
 describe the general nature of the query?  Does it call functions?  About
 how
 many rows does it insert?


Here is the command that was executing when the 0xC409 exception was
raised:

INSERT INTO places (bounding_box,country,full_name,id,name,type,url)

VALUES

(st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
32.771786,-97.034085 32.771786))',4326),26918),'United States','Irving,
TX','dce44ec49eb788f5','Irving','city','
http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),

(st_transform_null(ST_GeometryFromText('POLYGON((107.610398
-6.9006302,107.610398 -6.864448,107.637222 -6.864448,107.637222
-6.9006302,107.610398 -6.9006302))',4326),26918),'Indonesia','Coblong, Kota
Bandung','2c0294c5eab821c9','Coblong','city','
http://api.twitter.com/1/geo/id/2c0294c5eab821c9.json')

The st_transform_null function is simply a wrapper around the PostGIS
st_transform function that deals with NULL values. The other fields are all
VARCHARs. This insert is only adding two values. In general, the insert
commands I'm running insert anywhere up to 100 rows each, so they're not
huge.



 What server settings have you customized?
 https://wiki.postgresql.org/wiki/Server_Configuration


I haven't customized any settings.



 If you could get a stack trace or minidump, that would be most helpful:

 https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows


So I attached the VS debugger, but the server died without raising an
exception in VS. Not sure what's going on here.


 Magnus's questions for the reporter of bug #7517 are relevant, too.  Does
 the
 system have any antivirus software installed?


Testing disabling the AV software now. Will post back.

Google suggests those unadorned messages originate in R.  Do the affected
 systems use PL/R?  If so ...


Nope.

Really appreciate any help you can provide.

Matt

Re: [HACKERS] enhanced error fields

2012-07-03 Thread Matthew Woodcraft

Peter Geoghegan pe...@2ndquadrant.com writes:
 So I took a look at the patch eelog-2012-05-09.diff today. All of the
 following remarks apply to it alone.

I've been trying out this patch for my own interest (I'm very pleased to
see work on this feature), and I have a couple of suggestions from a
user's point of view.


First: if a not null constraint is violated, the error report includes
CONSTRAINT NAME 'not_null_violation'. I think I would find it more
useful if CONSTRAINT NAME were left unset rather than given a value that
doesn't correspond to a real constraint. A client program can tell it's
a null constraint violation from the SQLSTATE.


Second: in the case where a foreign-key constraint is violated by a
change in the primary-key table, the error report gives the following
information:

  TABLE NAME:name of primary-key table
  SCHEMA NAME:   schema of primary-key table
  CONSTRAINT NAME:   name of foreign-key constraint
  CONSTRAINT SCHEMA: schema of foreign-key table

It doesn't include the name of the foreign-key table (except in the
human-readable error message). But in principle you need to know that
table name to reliably identify the constraint that was violated.

I think what's going on is a mismatch between the way the constraint
namespace works in the SQL standard and in PostgreSQL: it looks like the
standard expects constraint names to be unique within a schema, while
PostgreSQL only requires them to be unique within a table. (A similar
issue makes information_schema less useful than the pg_ tables for
foreign key constraints.)

So I think it would be helpful to go beyond the standard in this case
and include the foreign-key table name somewhere in the report.

Possibly the enhanced-error reports could somehow add the table name to
the string in the CONSTRAINT NAME field, so that the interface
PostgreSQL provides looks like the one the standard envisages (which
ought to make it easier to write cross-database client code).

Or it might be simpler to just add a new enhanced-error field; I can
imagine cases where that table name would be the main thing I'd be
interested in.

-M-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name

2012-01-29 Thread Matthew Draper

On 25/01/12 18:37, Hitoshi Harada wrote:
 I'm still not sure whether to just revise (almost) all the SQL function
 examples to use parameter names, and declare them the right choice; as
 it's currently written, named parameters still seem rather second-class.
 
 Agreed. 

I'll try a more comprehensive revision of the examples.

 The patch seems ok, except an example I've just found.
 
 db1=# create function t(a int, t t) returns int as $$ select t.a $$
 language sql;
 CREATE FUNCTION
 db1=# select t(0, row(1, 2)::t);
  t
 ---
  1
 (1 row)
 
 Should we throw an error in such ambiguity? Or did you make it happen
 intentionally? If latter, we should also mention the rule in the
 manual.


I did consider it, and felt it was the most consistent:

# select t.x, t, z from (select 1) t(x), (select 2) z(t);
 x | t |  z
---+---+-
 1 | 2 | (2)
(1 row)


I haven't yet managed to find the above behaviour described in the
documentation either, though. To me, it feels like an obscure corner
case, whose description would leave the rules seeming more complicated
than they generally are.

Maybe it'd be better suited to be explicitly discussed in the comments?


Thanks,

Matthew

-- 
matt...@trebex.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name

2012-01-23 Thread Matthew Draper

On 19/01/12 20:28, Hitoshi Harada wrote:
 (Now it occurred to me that forgetting the #include parse_func.h might
 hit this breakage..., so I'll fix it here and continue to test, but if
 you'll fix it yourself, let me know)
 
 I fixed it here and it now works with my environment.

Well spotted; that's exactly what I'd done. :/


 The regression tests pass, the feature seems working as aimed, but it
 seems to me that it needs more test cases and documentation. For the
 tests, I believe at least we need ambiguous case given upthread, so
 that we can ensure to keep compatibility. For the document, it should
 describe the name resolution rule, as stated in the patch comment.

Attached are a new pair of patches, fixing the missing include (and the
other warning), plus addressing the above.

I'm still not sure whether to just revise (almost) all the SQL function
examples to use parameter names, and declare them the right choice; as
it's currently written, named parameters still seem rather second-class.


 Aside from them, I wondered at first what if the function is
 schema-qualified. Say,
 
 CREATE FUNCTION s.f(a int) RETURNS int AS $$
   SELECT b FROM t WHERE a = s.f.a
 $$ LANGUAGE sql;
 
 It actually errors out, since function-name-qualified parameter only
 accepts function name without schema name, but it looked weird to me
 at first. No better idea from me at the moment, though.

By my reading of (a draft of) the spec, Subclause 6.6, identifier
chain, Syntax Rules 8.b.i-iii, the current behaviour is correct.
But I join you in wondering whether we should permit the function name
to be schema-qualified anyway.


Matthew


-- 
matt...@trebex.net

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
new file mode 100644
index 7064312..cc5b5ef
*** a/doc/src/sgml/xfunc.sgml
--- b/doc/src/sgml/xfunc.sgml
*** SELECT getname(new_emp());
*** 538,556 
  programlisting
  CREATE FUNCTION tf1 (acct_no integer, debit numeric) RETURNS numeric AS $$
  UPDATE bank
! SET balance = balance - $2
! WHERE accountno = $1
  RETURNING balance;
  $$ LANGUAGE SQL;
  /programlisting
  
   Here the first parameter has been given the name literalacct_no/,
   and the second parameter the name literaldebit/.
!  So far as the SQL function itself is concerned, these names are just
!  decoration; you must still refer to the parameters as literal$1/,
!  literal$2/, etc within the function body.  (Some procedural
!  languages let you use the parameter names instead.)  However,
!  attaching names to the parameters is useful for documentation purposes.
   When a function has many parameters, it is also useful to use the names
   while calling the function, as described in
   xref linkend=sql-syntax-calling-funcs.
--- 538,580 
  programlisting
  CREATE FUNCTION tf1 (acct_no integer, debit numeric) RETURNS numeric AS $$
  UPDATE bank
! SET balance = balance - debit
! WHERE accountno = acct_no
  RETURNING balance;
  $$ LANGUAGE SQL;
  /programlisting
  
   Here the first parameter has been given the name literalacct_no/,
   and the second parameter the name literaldebit/.
!  Named parameters can still be referenced as
!  literal$replaceablen//; in this example, the second
!  parameter can be referenced as literal$2/, literaldebit/,
!  or literaltf1.debit/.
! /para
! 
! para
!  If a parameter is given the same name as a column that is available
!  in the query, the name will refer to the column. To explicitly
!  refer to the parameter, you can qualify its name with the name of
!  the containing function. For example,
! 
! programlisting
! CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
! UPDATE bank
! SET balance = balance - debit
! WHERE accountno = tf1.accountno
! RETURNING balance;
! $$ LANGUAGE SQL;
! /programlisting
! 
!  This time, the first parameter has been given the ambiguous name
!  literalaccountno/.
!  Notice that inside the function body, literalaccountno/ still
!  refers to literalbank.accountno/, so literaltf1.accountno/
!  must be used to refer to the parameter.
! /para
! 
! para
   When a function has many parameters, it is also useful to use the names
   while calling the function, as described in
   xref linkend=sql-syntax-calling-funcs.

diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
new file mode 100644
index 5642687..fe87990
*** a/src/backend/executor/functions.c
--- b/src/backend/executor/functions.c
***
*** 23,28 
--- 23,29 
  #include nodes/makefuncs.h
  #include nodes/nodeFuncs.h
  #include parser/parse_coerce.h
+ #include parser/parse_func.h
  #include tcop/utility.h
  #include utils/builtins.h
  #include utils/datum.h
*** typedef SQLFunctionCache *SQLFunctionCac
*** 115,121 
--- 116,124

[HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name

2012-01-14 Thread Matthew Draper


I just remembered to make time to advance this from WIP to proposed
patch this week... and then worked out I'm rudely dropping it into the
last commitfest at the last minute. :/


Anyway, my interpretation of the previous discussion is a general
consensus that permitting ambiguous parameter/column references is
somewhat unsavoury, but better than the alternatives:

http://archives.postgresql.org/pgsql-hackers/2011-04/msg00433.php
http://archives.postgresql.org/pgsql-hackers/2011-04/msg00744.php
(and surrounds)


The attached patch is essentially unchanged from my WIP version; it's
updated to current master (d0dcb31), and fixes a trivial copy/paste
error. It passes `make check`.

Also attached is a rather light doc patch, which I've separated because
I'm hesitant about which approach to take. The attached version just
changes the existing mention of naming parameters in:

http://www.postgresql.org/docs/9.1/static/xfunc-sql.html#XFUNC-NAMED-PARAMETERS

It presumably ought to be clearer about the name resolution
priorities... in a quick look, I failed to locate the corresponding
discussion of column name references to link to (beyond a terse sentence
in 4.2.1).

The alternative would be to adjust most of the examples in section 35.4,
using parameter names as the preferred option, and thus make $n the
other way.

I'm happy to do that, but I figure it'd be a bit presumptuous to present
such a patch without some discussion; it's effectively rewriting the
project's opinion of how to reference function parameters.



With regard to the discussion about aliasing the function name when used
as a qualifier
(http://archives.postgresql.org/pgsql-hackers/2011-04/msg00871.php), my
only suggestion would be that using $0 (as in, '$0.paramname') appears
safe -- surely any spec change causing it issues would equally affect
the existing $1 etc. '$.paramname' seems to look better, but presumably
runs into trouble by looking like an operator.

That whole discussion seems above my pay grade, however.


Original WIP post:

http://archives.postgresql.org/pgsql-hackers/2011-03/msg01479.php

This is an open TODO:

http://wiki.postgresql.org/wiki/Todo#SQL-Language_Functions



I've just added the above post to the CF app; I'll update it to point to
this one once it appears.



Thanks!

Matthew


-- 
matt...@trebex.net
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
new file mode 100644
index 5642687..74f3e7d
*** a/src/backend/executor/functions.c
--- b/src/backend/executor/functions.c
*** typedef SQLFunctionCache *SQLFunctionCac
*** 115,121 
--- 115,123 
   */
  typedef struct SQLFunctionParseInfo
  {
+ 	char	   *name;			/* function's name */
  	Oid		   *argtypes;		/* resolved types of input arguments */
+ 	char	  **argnames;		/* names of input arguments */
  	int			nargs;			/* number of input arguments */
  	Oid			collation;		/* function's input collation, if known */
  }	SQLFunctionParseInfo;
*** typedef struct SQLFunctionParseInfo
*** 123,128 
--- 125,132 
  
  /* non-export function prototypes */
  static Node *sql_fn_param_ref(ParseState *pstate, ParamRef *pref);
+ static Node *sql_fn_post_column_ref(ParseState *pstate, ColumnRef *cref, Node *var);
+ static Node *sql_fn_param_ref_num(ParseState *pstate, SQLFunctionParseInfoPtr pinfo, int paramno, int location);
  static List *init_execution_state(List *queryTree_list,
  	 SQLFunctionCachePtr fcache,
  	 bool lazyEvalOK);
*** prepare_sql_fn_parse_info(HeapTuple proc
*** 162,167 
--- 166,172 
  	int			nargs;
  
  	pinfo = (SQLFunctionParseInfoPtr) palloc0(sizeof(SQLFunctionParseInfo));
+ 	pinfo-name = NameStr(procedureStruct-proname);
  
  	/* Save the function's input collation */
  	pinfo-collation = inputCollation;
*** prepare_sql_fn_parse_info(HeapTuple proc
*** 200,205 
--- 205,240 
  		pinfo-argtypes = argOidVect;
  	}
  
+ 	if (nargs  0)
+ 	{
+ 		Datum		proargnames;
+ 		Datum		proargmodes;
+ 		int			argnum;
+ 		int			n_arg_names;
+ 		bool		isNull;
+ 
+ 		proargnames = SysCacheGetAttr(PROCNAMEARGSNSP, procedureTuple,
+ 	  Anum_pg_proc_proargnames,
+ 	  isNull);
+ 		if (isNull)
+ 			proargnames = PointerGetDatum(NULL);	/* just to be sure */
+ 
+ 		proargmodes = SysCacheGetAttr(PROCNAMEARGSNSP, procedureTuple,
+ 	  Anum_pg_proc_proargmodes,
+ 	  isNull);
+ 		if (isNull)
+ 			proargmodes = PointerGetDatum(NULL);	/* just to be sure */
+ 
+ 		n_arg_names = get_func_input_arg_names(proargnames, proargmodes, pinfo-argnames);
+ 
+ 		if (n_arg_names  nargs)
+ 			pinfo-argnames = NULL;
+ 	}
+ 	else
+ 	{
+ 		pinfo-argnames = NULL;
+ 	}
+ 
  	return pinfo;
  }
  
*** prepare_sql_fn_parse_info(HeapTuple proc
*** 209,222 
  void
  sql_fn_parser_setup(struct ParseState *pstate, SQLFunctionParseInfoPtr pinfo)
  {
- 	/* Later we might use these hooks to support parameter names */
  	pstate-p_pre_columnref_hook = NULL

Re: [HACKERS] Improve lseek scalability v3

2011-09-19 Thread Matthew Wilcox

On Mon, Sep 19, 2011 at 08:31:00AM -0400, Stephen Frost wrote:
 * Benjamin LaHaise (b...@kvack.org) wrote:
  For such tables, can't Postgres track the size of the file internally?  I'm 
  assuming it's keeping file descriptors open on the tables it manages, in 
  which case when it writes to a file to extend it, the internally stored 
  size 
  could be updated.  Not making a syscall at all would scale far better than 
  even a modified lseek() will perform.
 
 We'd have to have it in shared memory and have a lock around it, it
 wouldn't be cheap at all.

Yep, that makes perfect sense.  After all, the kernel does basically the
same thing to maintain this information; why should we have userspace
duplicating the same infrastructure?

I must admit, I'd never heard of this usage of lseek to get the current
size of a file before; I'd assumed everybody used fstat.  Given this
legitimate reason for a high-frequency calling of lseek, I withdraw my
earlier objection to the patch series.

-- 
Matthew Wilcox  Intel Open Source Technology Centre
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve lseek scalability v3

2011-09-16 Thread Matthew Wilcox

On Fri, Sep 16, 2011 at 04:16:49PM +0200, Andres Freund wrote:
 I sent an email containing benchmarks from Robert Haas regarding the Subject. 
 Looking at lkml.org I can't see it right now, Will recheck when I am at home.
 
 He replaced lseek(SEEK_END) with fstat() and got speedups up to 8.7 times the 
 lseek performance.
 The workload was 64 clients hammering postgres with a simple readonly 
 workload 
 (pgbench -S).

Yay!  Data!

 For reference see the thread in the postgres archives which also links to 
 performance data: http://archives.postgresql.org/message-
 id/CA+TgmoawRfpan35wzvgHkSJ0+i-W=vkjpknrxk2ktdr+hsa...@mail.gmail.com

So both fstat and lseek do more work than postgres wants.  lseek modifies
the file pointer while fstat copies all kinds of unnecessary information
into userspace.  I imagine this is the source of the slowdown seen in
the 1-client case.

There have been various proposals to make the amount of information returned
by fstat limited to the 'cheap' (for various definitions of 'cheap') fields.

I'd like to dig into the requirement for knowing the file size a little
better.  According to the blog entry it's used for the query planner.
Does the query planner need to know the exact number of bytes in the file,
or is it after an order-of-magnitude?  Or to-the-nearest-gigabyte?

-- 
Matthew Wilcox  Intel Open Source Technology Centre
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fsync reliability

2011-04-23 Thread Matthew Woodcraft

On 2011-04-22 21:55, Greg Smith wrote:
 On 04/22/2011 09:32 AM, Simon Riggs wrote:
 OK, that's good, but ISTM we still have a hole during
 RemoveOldXlogFiles() where we don't fsync or open/close the file, just
 rename it.
 
 This is also something that many applications rely upon working as hoped
 for here, even though it's not technically part of POSIX.  Early
 versions of ext4 broke that, and it caused a giant outcry of
 complaints. 
 http://www.h-online.com/open/news/item/Ext4-data-loss-explanations-and-workarounds-740671.html
 has a good summary.  This was broken on ext4 from around 2.6.28 to
 2.6.30, but the fix for it was so demanded that it's even been ported by
 the relatively lazy distributions to their 2.6.28/2.6.29 kernels.

As far as I can make out, the current situation is that this fix (the
auto_da_alloc mount option) doesn't work as advertised, and the ext4
maintainers are not treating this as a bug.

See https://bugzilla.kernel.org/show_bug.cgi?id=15910

-M-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name

2011-03-25 Thread Matthew Draper


Attached is a WIP patch that allows SQL-language functions to reference
their parameters by name.

It uses p_post_columnref_hook, so potentially ambiguous references
prefer the column... that seems to make the most sense, both because it
avoids a backwards incompatibility, and it conforms with SQL's usual
notion of assuming you mean the nearest name.

It allows the parameter name to be qualified with the function name, for
when you really mean you want the parameter.


This being my first foray into the PostgreSQL source, I expect the code
is horribly wrong in a variety of ways. Other than that, the regression
tests I've been using are a slight modification of existing queries; I
imagine they should look measurably different.

And finally, conspicuously absent are the documentation changes that
will obviously need to accompany a real patch.

(This builds  passes `make check` on current HEAD, a4425e3)


Thanks!

Matthew


-- 
matt...@trebex.net

diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
new file mode 100644
index ce3b77b..be71fbb
*** a/src/backend/executor/functions.c
--- b/src/backend/executor/functions.c
*** typedef SQLFunctionCache *SQLFunctionCac
*** 116,122 
--- 116,124 
   */
  typedef struct SQLFunctionParseInfo
  {
+ 	char	   *name;			/* function's name */
  	Oid		   *argtypes;		/* resolved types of input arguments */
+ 	char	  **argnames;		/* names of input arguments */
  	int			nargs;			/* number of input arguments */
  	Oid			collation;		/* function's input collation, if known */
  } SQLFunctionParseInfo;
*** typedef struct SQLFunctionParseInfo
*** 124,129 
--- 126,133 
  
  /* non-export function prototypes */
  static Node *sql_fn_param_ref(ParseState *pstate, ParamRef *pref);
+ static Node *sql_fn_post_column_ref(ParseState *pstate, ColumnRef *cref, Node *var);
+ static Node *sql_fn_param_ref_num(ParseState *pstate, SQLFunctionParseInfoPtr pinfo, int paramno, int location);
  static List *init_execution_state(List *queryTree_list,
  	 SQLFunctionCachePtr fcache,
  	 bool lazyEvalOK);
*** prepare_sql_fn_parse_info(HeapTuple proc
*** 163,168 
--- 167,173 
  	int			nargs;
  
  	pinfo = (SQLFunctionParseInfoPtr) palloc0(sizeof(SQLFunctionParseInfo));
+ 	pinfo-name = NameStr(procedureStruct-proname);
  
  	/* Save the function's input collation */
  	pinfo-collation = inputCollation;
*** prepare_sql_fn_parse_info(HeapTuple proc
*** 201,206 
--- 206,241 
  		pinfo-argtypes = argOidVect;
  	}
  
+ 	if (nargs  0)
+ 	{
+ 		Datum		proargnames;
+ 		Datum		proargmodes;
+ 		int			argnum;
+ 		int			n_arg_names;
+ 		bool		isNull;
+ 
+ 		proargnames = SysCacheGetAttr(PROCNAMEARGSNSP, procedureTuple,
+ 	  Anum_pg_proc_proargnames,
+ 	  isNull);
+ 		if (isNull)
+ 			proargmodes = PointerGetDatum(NULL);	/* just to be sure */
+ 
+ 		proargmodes = SysCacheGetAttr(PROCNAMEARGSNSP, procedureTuple,
+ 	  Anum_pg_proc_proargmodes,
+ 	  isNull);
+ 		if (isNull)
+ 			proargmodes = PointerGetDatum(NULL);	/* just to be sure */
+ 
+ 		n_arg_names = get_func_input_arg_names(proargnames, proargmodes, pinfo-argnames);
+ 
+ 		if (n_arg_names  nargs)
+ 			pinfo-argnames = NULL;
+ 	}
+ 	else
+ 	{
+ 		pinfo-argnames = NULL;
+ 	}
+ 
  	return pinfo;
  }
  
*** prepare_sql_fn_parse_info(HeapTuple proc
*** 210,223 
  void
  sql_fn_parser_setup(struct ParseState *pstate, SQLFunctionParseInfoPtr pinfo)
  {
- 	/* Later we might use these hooks to support parameter names */
  	pstate-p_pre_columnref_hook = NULL;
! 	pstate-p_post_columnref_hook = NULL;
  	pstate-p_paramref_hook = sql_fn_param_ref;
  	/* no need to use p_coerce_param_hook */
  	pstate-p_ref_hook_state = (void *) pinfo;
  }
  
  /*
   * sql_fn_param_ref		parser callback for ParamRefs ($n symbols)
   */
--- 245,354 
  void
  sql_fn_parser_setup(struct ParseState *pstate, SQLFunctionParseInfoPtr pinfo)
  {
  	pstate-p_pre_columnref_hook = NULL;
! 	pstate-p_post_columnref_hook = sql_fn_post_column_ref;
  	pstate-p_paramref_hook = sql_fn_param_ref;
  	/* no need to use p_coerce_param_hook */
  	pstate-p_ref_hook_state = (void *) pinfo;
  }
  
+ static Node *
+ sql_fn_resolve_name(ParseState *pstate, SQLFunctionParseInfoPtr pinfo, const char *paramname, int location)
+ {
+ 	int		i;
+ 	for (i = 0; i  pinfo-nargs; i++)
+ 		if (pinfo-argnames[i]  strcmp(pinfo-argnames[i], paramname) == 0)
+ 			return sql_fn_param_ref_num(pstate, pinfo, i + 1, location);
+ 
+ 	return NULL;
+ }
+ 
+ /*
+  * sql_fn_post_column_ref		parser callback for ColumnRefs
+  */
+ static Node *
+ sql_fn_post_column_ref(ParseState *pstate, ColumnRef *cref, Node *var)
+ {
+ 	SQLFunctionParseInfoPtr pinfo = (SQLFunctionParseInfoPtr) pstate-p_ref_hook_state;
+ 	int			names;
+ 	Node	   *field1;
+ 	Node	   *subfield = NULL;
+ 	const char *pname;
+ 	Node	   *param;
+ 
+ 	if (var != NULL)
+ 		return NULL;			/* there's a table column, prefer

Re: [HACKERS] compile/install of git

2010-09-19 Thread Matthew D. Fuller

On Sat, Sep 18, 2010 at 02:20:53PM -0400 I heard the voice of
David Blewett, and lo! it spake thus:

 Sorry for top-posting... I was under the impression that git over http was
 just as efficient since 1.6.6 [1].

That's about talking over HTTP to a git server running as CGI; it
doesn't help if you're talking HTTP to just a plain HTTP host.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-17 Thread Matthew D. Fuller

On Tue, Aug 17, 2010 at 01:57:02PM -0600 I heard the voice of
Alex Hunsaker, and lo! it spake thus:
 On Tue, Aug 17, 2010 at 13:52, Alex Hunsaker bada...@gmail.com wrote:
  How sure are we that its not the cvs2svn step that is screwing it up?
 
 urp, I jumped to a conclusion while skimming the cvs2git.options
 file Magnus posted.  With all the references to svn and things like
 GitRevisionRecorder('cvs2svn-tmp/git-blob.dat').  It sure sounded
 like it converts to svn first and then to git...  im not sure what
 it does.

It's not that it converts to svn, but that it's built on (/part of)
cvs2svn, so presumably a lot of the figure out changesets and branch
membership logic and the get things in the shape svn wants logic
are intertwined.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-29 Thread Matthew Wakeling

(Yes, I know I'm not on the hackers list. Most interested parties should 
get this directly anyway.)


Additionally the interface exposed by the JDBC driver lets the user 
write arbitrary CopyData bytes to the server, so without parsing all of 
that we don't know whether they've issued CopyData(EOF) or not.


Okay, so you can't know with absolute certainty without parsing the 
data, but the usual case would be handled by holding onto the last-N 
bytes or so. Enough to fit the EOF and perhaps a little more for 
paranoia's sake.


That's not to say that I'm missing the problem. When (not if, when) 
the user feeds data past a CopyData(EOF), it's going to get interesting.


This is the reason why the patch to the JDBC driver that I sent in is very 
fragile. In the case where a user provides a binary copy with lots of data 
after the EOF, the processCopyData method *will* get called after the 
CommandComplete and ReadyForQuery messages have been received, even if we 
try to delay processing of the ReadyForQuery message.


[Thinking about the logic necessary to handle such a case and avoid 
network buffer deadlock...] I would think the least invasive way to 
handle it would be to set the CommandComplete and ReadyForQuery messages 
aside when they are received if CopyDone hasn't been sent, continue the 
COPY operation as usual until it is shutdown, send CopyDone and, 
finally, reinstate CommandComplete and RFQ as if they were just 
received..


Basically, yes. We need to introduce a little more state into the JDBC 
driver. Currently, the driver is in one of two states:


1. In the middle of a copy.
2. Not in a copy.

These states are recorded in the lock system. We need to introduce a new 
state, where the copy is still locked, but we know that the 
CommandComplete and ReadyForQuery messages have been received. We can no 
longer unlock the copy in processCopyData - we need to do that in endCopy 
instead, after calling processCopyData to ensure that we wait for a valid 
CommandComplete and ReadyForQuery message first.


Matthew

--
Terrorists evolve but security is intelligently designed?  -- Jake von Slatt

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Alias to rollback keyword

2010-03-31 Thread Matthew Altus

Hey,

After dealing with a production fault and having to rollback all the time, I 
kept typing a different word instead of rollback.  So I created a patch to 
accept this word as an alias for rollback.  Obviously it's not part of the sql 
standard, but could be a nice extension for postgresql.  See the patch for 
more details.

Cheers
Matt
diff -U5 -r postgresql-8.4.2.orig/src/include/parser/kwlist.h postgresql-8.4.2/src/include/parser/kwlist.h
--- postgresql-8.4.2.orig/src/include/parser/kwlist.h	2009-04-06 18:12:53.0 +0930
+++ postgresql-8.4.2/src/include/parser/kwlist.h	2010-04-01 10:55:46.0 +1030
@@ -55,10 +55,11 @@
 PG_KEYWORD(begin, BEGIN_P, UNRESERVED_KEYWORD)
 PG_KEYWORD(between, BETWEEN, TYPE_FUNC_NAME_KEYWORD)
 PG_KEYWORD(bigint, BIGINT, COL_NAME_KEYWORD)
 PG_KEYWORD(binary, BINARY, TYPE_FUNC_NAME_KEYWORD)
 PG_KEYWORD(bit, BIT, COL_NAME_KEYWORD)
+PG_KEYWORD(bollocks, ROLLBACK, UNRESERVED_KEYWORD)
 PG_KEYWORD(boolean, BOOLEAN_P, COL_NAME_KEYWORD)
 PG_KEYWORD(both, BOTH, RESERVED_KEYWORD)
 PG_KEYWORD(by, BY, UNRESERVED_KEYWORD)
 PG_KEYWORD(cache, CACHE, UNRESERVED_KEYWORD)
 PG_KEYWORD(called, CALLED, UNRESERVED_KEYWORD)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 8.5 release timetable, again

2009-08-26 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

This seems a good idea.  Possibly pushing the betas more aggresively to
current users would make them tested not only by PG hackers ...


Isn't this the purpose of the new alpha releases, at lease to some extent.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Visibility map, partial vacuums

2008-11-23 Thread Matthew T. O'Connor


Tom Lane wrote:

However, my comment above was too optimistic, because in an insert-only
scenario autovac would in fact not trigger VACUUM at all, only ANALYZE.

So it seems like we do indeed want to rejigger autovac's rules a bit
to account for the possibility of wanting to apply vacuum to get
visibility bits set.


I'm sure I'm missing something, but I thought the point of this was to 
lessen the impact of VACUUM and now you are suggesting that we have to 
add vacuums to tables that have never needed one before.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Block-level CRC checks

2008-11-17 Thread Matthew T. O'Connor


Aidan Van Dyk wrote:

* Greg Stark [EMAIL PROTECTED] [081117 03:54]:
I thought of saying that too but it doesn't really solve the problem.  
Think of what happens if someone sets a hint bit on a dirty page.


If the page is dirty from a real change, then it has a WAL backup block
record already, so the torn-page on disk is going to be fixed with the wal
replay ... *because* of the torn-page problem already being solved in PG.
You don't get the hint-bits back, but that's no different from the current
state.  But nobody's previously cared if hint-bits wern't set on WAL replay.



What if all changes to a page (even hit bits) are WAL logged when 
running with Block-level CRC checks enables, does that make things 
easier?  I'm sure it would result in some performance loss, but anyone 
enabling Block Level CRCs is already trading some performance for safety.


Thoughts?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Matthew T. O'Connor


Tom Lane wrote:

Decibel! [EMAIL PROTECTED] writes:
  
I think that's pretty seriously un-desirable. It's not at all  
uncommon for databases to stick around for a very long time and then  
jump ahead many versions. I don't think we want to tell people they  
can't do that.



Of course they can do that --- they just have to do it one version at a
time.


Also, people may be less likely to stick with an old outdated version 
for years and years if the upgrade process is easier.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RAM-only temporary tables

2008-11-05 Thread Matthew T. O'Connor


Kevin Grittner wrote:

An idea for a possible enhancement to PostgreSQL: allow creation of a
temporary table without generating any disk I/O.  (Creating and
dropping a three-column temporary table within a database transaction
currently generates about 150 disk writes).
 
If some circumstances don't make it feasible to always do this as a

RAM-only operation, perhaps a clause could be added to the CREATE
TEMPORARY TABLE syntax to specify this behavior along with whatever
limitations on the temporary table are required for this to work. 
(For example, maybe this is only feasible if the table will be dropped

by the end of the transaction?)


As someone else already pointed out you can put temp tables on a RAM 
disk, but the larger issue is that temp tables still cause system table 
churn which will always need to be on stable media.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 8.3 .4 + Vista + MingW + initdb = ACCESS_DENIED

2008-10-15 Thread Matthew T. O'Connor


Tom Lane wrote:

Andrew Chernow [EMAIL PROTECTED] writes:

Be careful.  From LockFileEx docs:


However, the time it takes for the operating system to unlock these 
locks depends upon available system resources. Therefore, it is 
recommended that your process explicitly unlock all files it has locked 
when it terminates. If this is not done, access to these files may be 
denied if the operating system has not yet unlocked them.


ROTFL ... so to translate: If your program crashes, please release
locks before crashing.


Obviously that wasn't the intent of the above, but I guess it is the net 
effect.  Either way, I don't think it's a huge problem, it just means 
that PG may not be able to restart for a few seconds until the OS has 
time to clean-up the locks.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE DATABASE vs delayed table unlink

2008-10-10 Thread Matthew Wakeling



The error on createdb happened again this morning. However, this time an 
abandoned directory was not created. The full error message was:


$ createdb -E SQL_ASCII -U flyminebuild -h brian.flymine.org -T 
production-flyminebuild production-flyminebuild:uniprot
createdb: database creation failed: ERROR:  could not stat file 
base/33049747/33269704: No such file or directory

However, my colleagues promptly dropped the database that was being 
copied and restarted the build process, so I can't diagnose anything. 
Suffice to say that there is no abandoned directory, and the directory 
33049747 no longer exists either.


I'll try again to get some details next time it happens.

Matthew

--
$ rm core
Segmentation Fault (core dumped)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE DATABASE vs delayed table unlink

2008-10-09 Thread Matthew Wakeling


On Thu, 9 Oct 2008, Tom Lane wrote:

So I'm mystified
how Matthew could have seen the expected error and yet had the
destination tree (or at least large chunks of it) left behind.


Remember I was running 8.3.0, and you mentioned a few changes after that 
version which would have made sure the destination tree was cleaned up 
properly.



[ thinks for a bit... ]  We know there were multiple occurrences.
Matthew, is it possible that you had other createdb failures that
did *not* report file does not exist?  For instance, a createdb
interrupted by a fast database shutdown might have left things this
way.


Well, we didn't have any fast database shutdowns or power failures. I 
don't think so.


Matthew

--
Heat is work, and work's a curse. All the heat in the universe, it's
going to cool down, because it can't increase, then there'll be no
more work, and there'll be perfect peace.  -- Michael Flanders

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE DATABASE vs delayed table unlink

2008-10-09 Thread Matthew Wakeling


Heikki Linnakangas [EMAIL PROTECTED] writes:

Another thought is to ignore ENOENT in copydir.


On Wed, 8 Oct 2008, Tom Lane wrote:

Yeah, I thought about that too, but it seems extremely dangerous ...


I agree. If a file randomly goes missing, that's not an error to ignore, 
even if you think the only way that could happen is safe.


I could be wrong - but couldn't other bad things happen too? If you're 
copying the files before the checkpoint has completed, couldn't the new 
database end up with some of the recent changes going missing? Or is that 
prevented by FlushDatabaseBuffers?


Matthew

--
Isn't Microsoft Works something of a contradiction?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] September CommitFest Closed

2008-10-01 Thread Matthew T. O'Connor


Josh Berkus wrote:
For the September commitfest, 29 patches were applied (one to pgFoundry) 
 and 18 patches were sent back for more work.


More importantly, six *new* reviewers completed reviews of of various 
patches: Abbas Butt, Alex Hunsaker, Markus Wanner, Ibrar Ahmed, Ryan 
Bradetich and Gianni Colli.  Several other new reviewers volunteered, 
but we ran out of patches to check, and Kenneth Marshall had to quit 
reviewing because of Hurricane Ike.


Yay, reviewers.



If nothing else ever came of the Commit Fest approach, if it creates 
more reviewers, then I think the Commit Fest process would be a success.


I think the Commit Fest approach does this by lowering the bar of entry 
to become a reviewer.  It does this because it brings more focus to 
patch review on a regular basis and since it changes the patch review 
process from a last minute dash that only experience hackers should get 
involved with to a process with more time before the final deadline, 
hence more friendly for new reviewers to get involved.  Good news all 
around!




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] VACUUM Improvements - WIP Patch

2008-08-24 Thread Matthew T. O'Connor


Joshua D. Drake wrote:

Merlin Moncure wrote:

Well, there doesn't seem to be a TODO for partial/restartable vacuums,
which were mentioned upthread.  This is a really desirable feature for
big databases and removes one of the reasons to partition large
tables.
I would agree that partial vacuums would be very useful. 



I think everyone agrees that partial vacuums would be useful / *A Good 
Thing* but it's the implementation that is the issue.  I was thinking 
about Alvaro's recent work to make vacuum deal with TOAST tables 
separately, which is almost like a partial vacuum since it effectively 
splits the vacuum work up into multiple independent blocks of work, the 
limitation obviously being that it can only split the work around 
TOAST.  Is there anyway that vacuum could work per relfile since we 
already split tables into files that are never greater than 1G?  I would 
think that if Vacuum never had more than 1G of work to do at any given 
moment it would make it much more manageable.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] VACUUM Improvements - WIP Patch

2008-08-24 Thread Matthew T. O'Connor


Tom Lane wrote:

Matthew T. O'Connor [EMAIL PROTECTED] writes:
  
I think everyone agrees that partial vacuums would be useful / *A Good 
Thing* but it's the implementation that is the issue.



I'm not sure how important it will really be once we have support for
dead-space-map-driven vacuum.


Is that something we can expect any time soon? I haven't heard much 
about it really happening for 8.4.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-23 Thread Matthew T. O'Connor


Tom Lane wrote:

Greg Sabino Mullane [EMAIL PROTECTED] writes:

Code outside of core, is, in reality, less reviewed, less likely to work
well with recent PG versions, and more likely to cause problems. It's also
less likely to be found by people, less likely to be used by people, and
less likely to be included by distros. Not to say that everything should get
shoved into core, of course, but there are strong arguments for both sides.


These are all true statements, of course, but ISTM they should be looked
on as problems to be solved.  Pushing stuff into core instead of solving
these problems is not a scalable long-term answer.


A few random thoughts...

The application that comes to mind first for me when you talk plugins is 
Firefox.  They make it very easy to browse for plugins and to install, 
update, remove them.  Their plug-in system also tries to account for 
Firefox version and OS platform which we would need to do also.


Perhaps one thing that would help PostgreSQL plug-ins is a nice GUI 
plug-in browser and management application.  The logical place to add 
this IMHO is PGAdmin since it is GUI, already talks to the DB and is 
cross platform.  I'm not saying a GUI should be required to manage 
plug-ins, a fully CLI option should be made available too.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Concurrent VACUUM and ANALYZE

2008-07-21 Thread Matthew T. O'Connor


Jonah H. Harris wrote:

On Mon, Jul 21, 2008 at 10:19 PM, Tom Lane [EMAIL PROTECTED] wrote:
  

I don't find this a compelling argument, at least not without proof that
the various vacuum-improvement projects already on the radar screen
(DSM-driven vacuum, etc) aren't going to fix your problem.



Is DSM going to be in 8.4?  The last I had heard, DSM+related
improvements weren't close to being guaranteed for this release.  If
it doesn't make it, waiting another year and a half for something
easily fixed would be fairly unacceptable.  Should I provide a patch
in the event that DSM doesn't make it?


Can't hurt to submit a patch.  Also, could you do something to help 
mitigate the worse case, something like don't update the stats in 
pg_class if the analyze finishes after a vacuum has finished since the 
current analyze started?


Matt


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuuming leaked temp tables (once again)

2008-06-27 Thread Matthew T. O'Connor


Tom Lane wrote:

We might have to rearrange the logic a bit to make that happen (I'm not
sure what order things get tested in), but a log message does seem like
a good idea.  I'd go for logging anytime an orphaned table is seen,
and dropping once it's past the anti-wraparound horizon.


Is there an easy way for an Admin clean-up the lost temp tables that 
autovacuum is complaining about?  It seems like it could be along time 
and a lot of log messages between when they are first orphaned and and 
finally dropped due to anti-wraparound protection.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XIDs and big boxes again ...

2008-05-12 Thread Matthew T. O'Connor


Hans-Juergen Schoenig wrote:
i suggest to introduce a --with-long-xids flag which would give me 62 / 
64 bit XIDs per vacuum on the entire database.

this should be fairly easy to implement.
i am not too concerned about the size of the tuple header here - if we 
waste 500 gb of storage here i am totally fine.


As you say later in the thread, you are on 8.1.  Alot of work has gone 
into reducing the effect, impact and frequency of XID wrap around  and 
vacuuming since then.  In 8.3 transactions that don't actually update a 
table no long use a real XID and autovacuum you no longer need a 
database wide vacuum to solve the XID wraparound problem, so I think the 
answer is upgrade to 8.3 and see if you still have this problem.


Matthew O'Connor

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posting to hackers and patches lists

2008-05-07 Thread Matthew T. O'connor


Alex Hunsaker wrote:

In fact I
would argue -patches should go away so we dont have that split.


+1I think the main argument for the split is to keep the large 
patch emails off the hackers list, but I don't think that limit is so 
high that it's a problem.  People have to gzip their patches to the 
patches list fairly often anyway.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Posting to hackers and patches lists

2008-05-07 Thread Matthew T. O'connor


Alex Hunsaker wrote:

A big part of my problem with the split is if there is a discussion
taking place on -hackers I want to be able to reply to the discussion
and say well, here is what I was thinking.  Sending it to -patches
first waiting for it to hit the archive so I can link to it in my
reply on -hackers seems pointless and convoluted.

But if thats what you want, thats what ill try to do from now on :)

For instance the patch Tom reviewed of mine yesterday only -hackers
was copied, so I maintained that but also added -patches because I was
sending in a patch...

I think It will be an ongoing problem though especially for new people
as they probably wont understand the logical split...


Patches are an integral part of the conversation about development, I 
think trying to split them up is awkward at best.  Do people really 
still think that the potential for larger messages is really a problem?  
By the way, what is the actual size limit on hackers vs patches.  I 
would imagine that most patches would already fit in the current hackers 
limit, especially since you can gzip.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] US VISA CISP PCI comp. needs SHA1

2008-04-02 Thread Matthew Wetmore

Not sure if I posted in correct spot


pg_8.2.6
Centos5
Windows based app.
encryped pwd = yes
SSL = yes,
hostssl with explicit IP w/md5. (no pg_crypto)



We are in process of VISA CISP PCI compliance for our application.
(online cc auth - no stored cc data) [next phase will include stored cc
data]

We just heard back today that they would like to use SHA1 for pwd auth.

does anyone have any doco that will support md5 vs. SHA1?

We also have global customers so we understand the us v non-US export stuff.

Any direction is appreciated.

Thanks in advance.

/matthew wetmore

-- 

Matthew Wetmore
Secom International, Inc
9610 Bellanca, Ave.
Los Angeles, CA 90045
310-641-1290


This e-mail is intended for the addressee shown. It contains information
that is confidential and protected from disclosure. Any review,
dissemination or use of this transmission or its contents by persons or
unauthorized employees of the intended organisations is strictly
prohibited.
The contents of this email do not necessarily represent the views or
policies of Secom International Inc., or its employees.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Visa CISP PCI compliance needs SHA1?

2008-04-02 Thread Matthew Wetmore

Not sure if I posted in correct spot
But seems to be topic of today...funny on same day I hear from Visa.

pg_8.2.6
Centos5
Windows based app.
encryped pwd = yes
SSL = yes,
hostssl, with explicit IP w/md5,. (no pg_crypto)

This is just with client / server pwd auth

We are in process of VISA CISP PCI compliance for our application.
(online cc auth - no stored cc data)
[next phase will include stored cc data]

We just heard back today that they would like to use SHA1 NOT md5 for
pwd auth.

does anyone have any doco that will support md5 vs. SHA1?
is PG_crypto in the db (meaning crypt the md5 hash )still the same as
md5 auth

We also have global customers so we understand the US v non-US export stuff.

Any direction is appreciated.

Thanks in advance.

/matthew wetmore
-- 

Matthew Wetmore
Secom International, Inc
9610 Bellanca, Ave.
Los Angeles, CA 90045
310-641-1290


This e-mail is intended for the addressee shown. It contains information
that is confidential and protected from disclosure. Any review,
dissemination or use of this transmission or its contents by persons or
unauthorized employees of the intended organisations is strictly
prohibited.
The contents of this email do not necessarily represent the views or
policies of Secom International Inc., or its employees.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] offtopic, historic, in 8.1, how to find relfrozenxid by table?

2007-11-17 Thread Matthew Grosso


in 8.1, how do I find the relfrozenxid of a table?

we're running 8.1.9, and have turned off autovacuum for a wee bit too
long...

thanks to the verbose warnings, we still have a few million transactions
left before it locks up, although thats not much time for us.

I'd like to focus vacuum on the tables that need it, but 8.1 does not
have a relfrozenxid column in pg_class, although it does have
datfrozenxid in pg_database.

sorry for skipping the general list, but I'm not subscribed to that, and
unsure of the latency there.

thanks,
Matt

-- 
Matt Grosso
[EMAIL PROTECTED]
cell (201)780-9592
http://www.connexuscorp.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] Built PostgreSQL 8.3 BETA on FreeBSD/SPARC64

2007-10-24 Thread Matthew Alton

Machine: Sun Microsystems Ultra 5 (SPARC IIi).

OS: FreeBSD 6.2/SPARC64

Compiler: gcc v.3.4.6

PostgreSQL 8.3 Beta builds on my system with no trouble.


Matthew Alton

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-01 Thread Matthew T. O'Connor


Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:
  

This is an interesting idea, but I think it's attacking the wrong
problem.  To me, the problem here is that an ANALYZE should not block
CREATE INDEX or certain forms of ALTER TABLE.



I doubt that that will work; in particular I'm pretty dubious that you
can safely make CREATE INDEX and VACUUM run together.  Since they'd be
unlikely to be using the identical OldestXmin horizon, you'd likely end
up with dangling index entries (ie, CREATE INDEX indexes a tuple that
the VACUUM removes shortly afterward).


I think the main issue is ANALYZE not VACUUM (at least in this thread) 
since it's DB load times that are in question.  Can CREATE INDEX and 
ANALYZE be made to run concurrently?




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-01 Thread Matthew T. O'Connor


Tom Lane wrote:

If you insist on crafting a solution that only fixes this problem for
pg_restore's narrow usage, you'll be back revisiting it before beta1
has been out a month.



I don't know much about what is involved in crafting these solutions, 
but it seems we're close to beta and probably don't want to make drastic 
changes to anything.  As such it seems to me that solving the problem 
for analyze is a nice piece of low-hanging fruit that solves an 
immediate problem that has been reported.  I would think that reducing 
the locking implications of  VACUUM is much more involved, no?  Also, I 
would think that the fix for ANALYZE will be totally different than the 
fix for VACUUM no?


Are you proposing that we solve the VACUUM locking problem before we 
release 8.3?




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] More logging for autovacuum

2007-08-07 Thread Matthew T. O'Connor


Gregory Stark wrote:

I'm having trouble following what's going on with autovacuum and I'm finding
the existing logging insufficient. In particular that it's only logging vacuum
runs *after* the vacuum finishes makes it hard to see what vacuums are running
at any given time. Also, I want to see what is making autovacuum decide to
forgo vacuuming a table and the log with that information is at DEBUG2.

So I would like to suggest adding two options:

log_autovacuum_jobs - output every time a vacuum or analyze *starts*

log_autovacuum_level - set the log level for the autovacuum process

I would also suggest raising the level of the DEBUG2 message indicating why
tables were chosen or not. At least to DEBUG1 if not to INFO.

Am I missing anything? Are there ways to get this info already that I'm
missing? I imagine it would be pretty simple to add these and I'll be happy to
do it and send the patch to -patches assuming others (Alvaro? :) agree.



I think this sounds good.  There was talk a while ago about need a 
special log level setting just for autovacuum, but nobody did the leg 
work.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Still recommending daily vacuum...

2007-07-06 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
Well, if a table has 10 rows, and we keep the current threshold of 1000
rows, then this table must have 1002 dead tuples (99% dead tuples, 1002
dead + 10 live) before being vacuumed.  This seems wasteful because
there are 500 dead tuples on it and only 10 live tuples.  So each scan
must wade through all the dead tuples.

Another small table with 100 tuples will be vacuumed on every iteration
as well, even if there are just two dead tuples.  So you are right --
maybe dropping it all the way to 0 is too much.  But a small value of 10
is reasonable?  That will make the 10 tuple table be vacuumed when there
are 10 dead tuples (50% of dead tuples), and the 100 tuple table when
there are 11 (11% of dead tuples).  It decreases quickly to the scale
factor (2%, or do we want to decrease it to 1%?)


I think it's probably fine. I think, that the optimal number for the 
base_threhold is probably dependant on the width of the row, for a very 
narrow row where you might have many on the same page, 20 or 50 might be 
right, but for a very wide table a smaller number might be optimal, 
however I think it probably doesn't matter much anyway.


Reducing the default to 10 seems fine, and perhaps even removing it as a 
tuning knob.  I think there are too many autovacuum knobs and it 
confuses people.  Is it too late to possibly remove this GUC altogether?



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Jim C. Nasby wrote:

FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.


How about pushing thresholds all the way down to 0?



As long as it handles small (or zero row) tables ok then yes.  The 
base_threshold in the originial contrib autovacuum was just an easy way 
to not vacuum really small tables too often.  If a table has only 10 
rows, it's going to get vacuumed every time one row is updated.  I guess 
that's not a big problem with a table that small but still seems excessive.


If you think this isn't a problem with the current autovacuum, then sure 
turn it down to zero, and perhaps we can even get rid of it altogether 
in another release or two.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-08 Thread Matthew O'Connor


Michael Paesold wrote:

Matthew T. O'Connor schrieb:
Do we need a configurable autovacuum naptime at all?  I know I put it 
in the original contrib autovacuum because I had no idea what knobs 
might be needed.  I can't see a good reason to ever have a naptime 
longer than the default 60 seconds, but I suppose one might want a 
smaller naptime for a very active system?


A PostgreSQL database on my laptop for testing. It should use as little 
resources as possible while being idle. That would be a scenario for 
naptime greater than 60 seconds, wouldn't it?


Perhaps, but that isn't the use case PostgresSQL is being designed for. 
 If that is what you really need, then you should probably disable 
autovacuum.  Also a very long naptime means that autovacuum will still 
wake up at random times and to do the work.  At least with short 
naptime, it will do the work shortly after you updated your tables.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] COPYable logs status

2007-06-08 Thread Matthew T. O'Connor


Andrew Dunstan wrote:
The situation with this patch is that I now have it in a state where I 
think it could be applied, but there is one blocker, namely that we do 
not have a way of preventing the interleaving of log messages from 
different backends, which leads to garbled logs. This is an existing 
issue about which we have had complaints, but it becomes critical for a 
facility the whole purpose of which is to provide logs in a format 
guaranteed to work with our COPY command.


Unfortunately, there is no solution in sight for this problem, certainly 
not one which I think can be devised and implemented simply at this 
stage of the cycle. The solution we'd like to use, LWLocks, is not 
workable in his context. In consequence, I don't think we have any 
option but to shelve this item for the time being.


I think this will get shot down, but here goes anyway...

How about creating a log-writing-process?  Postmaster could write to the 
log files directly until the log-writer is up and running, then all 
processes can send their log output through the log-writer.




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] COPYable logs status

2007-06-08 Thread Matthew T. O'Connor


Tom Lane wrote:

Matthew T. O'Connor [EMAIL PROTECTED] writes:
How about creating a log-writing-process?  Postmaster could write to the 
log files directly until the log-writer is up and running, then all 
processes can send their log output through the log-writer.


We *have* a log-writing process.  The problem is in getting the data to it.


By that I assume you mean the bgwriter, I thought that was for WAL data, 
I didn't think it could or perhaps should be used for normal log file 
writing, but I also know I'm way outside my comfort area in talking 
about this, so excuse the noise if this is way off base.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-08 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Jim C. Nasby escribió:

There *is* reason to allow setting the naptime smaller, though (or at
least there was; perhaps Alvero's recent changes negate this need):
clusters that have a large number of databases. I've worked with folks
who are in a hosted environment and give each customer their own
database; it's not hard to get a couple hundred databases that way.
Setting the naptime higher than a second in such an environment would
mean it could be hours before a database is checked for vacuuming.


Yes, the code in HEAD is different -- each database will be considered
separately.  So the huge database taking all day to vacuum will not stop
the tiny databases from being vacuumed in a timely manner.

And the very huge table in that database will not stop the other tables
in the database from being vacuumed either.  There can be more than one
worker in a single database.


Ok, but I think the question posed is that in say a virtual hosting 
environment there might be say 1,000 databases in the cluster. Am I 
still going to have to wait a long time for my database to get vacuumed? 
 I don't think this has changed much no?


(If default naptime is 1 minute, then autovacuum won't even look at a 
given database but once every 1,000 minutes (16.67 hours) assuming that 
there isn't enough work to keep all the workers busy.)


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] [BUGS] BUG #3326: Invalid lower bound of autovacuum_cost_limit

2007-06-07 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:

But this is misleading (started postmaster with good value, then edited
postgresql.conf and entered -2):
17903 LOG:  received SIGHUP, reloading configuration files
17903 LOG:  -2 is outside the valid range for parameter 
autovacuum_vacuum_cost_limit (-1 .. 1000)
Note how it still says the range is -1 .. 1000.

Can we redefine things to make zero be the disabled value, thus
keeping the range of valid values contiguous?


That would be another solution ... though it would be different from the
valid value for autovacuum_vacuum_cost_delay (on which 0 is a valid
value).  Also it would be a different value from previous versions.

I don't think either of these is a showstopper, so let's go for that if
nobody objects.


Can you make 0 and -1 both valid disabled values?  That way it will be 
compatible with previous releases.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-07 Thread Matthew T. O'Connor


Tom Lane wrote:

Andrew Hammond [EMAIL PROTECTED] writes:

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be
better to say that setting the naptime really high is a Bad Idea.


It seems like we should have an upper limit on the GUC variable that's
less than INT_MAX ;-).  Would an hour be sane?  10 minutes?

This is independent of the problem at hand, though, which is that we
probably want the launcher to notice postmaster death in less time
than autovacuum_naptime, for reasonable values of same.


Do we need a configurable autovacuum naptime at all?  I know I put it in 
the original contrib autovacuum because I had no idea what knobs might 
be needed.  I can't see a good reason to ever have a naptime longer than 
the default 60 seconds, but I suppose one might want a smaller naptime 
for a very active system?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor


Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 


Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Simon Riggs wrote:

On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 
Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.

The replication is an exact block-level replication of the master. We
can't write very much at all on the slave.


Hmm, something to keep in mind is forcing cache invals when the master
causes them (for example relation cache, catalog caches and plan
caches).



Perhaps if you are as PITR master and you have active readonly slaves 
then there should be a WAL record to note plan invalidations, etc?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Autovacuum versus rolled-back transactions

2007-06-01 Thread Matthew T. O'Connor


Tom Lane wrote:

ITAGAKI Takahiro [EMAIL PROTECTED] writes:

Our documentation says
| analyze threshold = analyze base threshold
|   + analyze scale factor * number of tuples
| is compared to the total number of tuples inserted, updated, or deleted
| since the last ANALYZE. 



but deleted tuples are not considered in the total number, because the delta
of {n_live_tuples + n_dead_tuples} is not changed by DELETE. We add the number
of DELETE into n_live_tuples and subtract it from n_dead_tuples.


Yeah, I was concerned about that when I was making the patch, but didn't
see any simple fix.  A large number of DELETEs (without any inserts or
updates) would trigger a VACUUM but not an ANALYZE, which in the worst
case would be bad because the stats could have shifted.

We could fix this at the cost of carrying another per-table counter in
the stats info, but I'm not sure it's worth it.


I believe that whenever autovacuum performs a VACUUM it actually 
performs a VACUUM ANALYZE at leas the old contrib version did and I 
think Alvaro copied that.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Autovacuum versus rolled-back transactions

2007-05-26 Thread Matthew O'Connor


Tom Lane wrote:

This means that a table could easily be full of dead tuples from failed
transactions, and yet autovacuum won't do a thing because it doesn't
know there are any.  Perhaps this explains some of the reports we've
heard of tables bloating despite having autovac on.


I think this is only a problem for failed inserts as failed updates will 
be accounted for correctly by autovac and as you said, failed deletes 
really do nothing.  So is there a way for rollback to just add the 
number of rolled back inserts to the n_tup_del counter?  Then we would 
be ok, no?



I think it's fairly obvious how n_live_tup and n_dead_tup ought to
change in response to a failed xact, but maybe not so obvious for the
other counters.  I suggest that the scan/fetch counters (seq_scan,
seq_tup_read, idx_scan, idx_tup_fetch) as well as all the block I/O
counters should increment the same for committed and failed xacts,
since they are meant to count work done regardless of whether the work
was in vain.  I am much less sure how we want n_tup_ins, n_tup_upd,
n_tup_del to act though.  Should they be advanced as normal by a
failed xact?  That's what the code is doing now, and if you think they
are counters for work done, it's not so unreasonable.


I think autovac only considers n_tup_(upd|ins|del) so while it might be 
correct to fix those other counters, I don't know that they are must fix 
items.




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] What X86/X64 OS's do we need coverage for?

2007-04-06 Thread Matthew O'Connor


Devrim Gündüz wrote:

Hi,

On Fri, 2007-04-06 at 01:23 -0400, Matthew T. O'Connor wrote:

The other thing to consider is that CentOS 5 has Xen built right in,
so you should be able run VMs without VMWare on it. 


... if the kernel of the OS has Xen support, there will be no
performance penalty (only 2%-3%) (Para-virtualization). Otherwise, there
will be full-virtualization, and we should expect a performance loss
about 30% for each guest OS (like Windows).


I may be wrong but I thought that the guest OS kernel only needs special 
support if the underlying CPU doesn't have virtualization support which 
pretty much all the new Intel and AMD chips have.  No?


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] What X86/X64 OS's do we need coverage for?

2007-04-05 Thread Matthew T. O'Connor


Larry Rosenman wrote:

I might use that as the base then, since the hardware finishes getting here
tomorrow.


The other thing to consider is that CentOS 5 has Xen built right in, so 
you should be able run VMs without VMWare on it.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Documentation access problems.

2007-03-23 Thread Matthew T. O'Connor


Joshua D. Drake wrote:

The big thing for me, is a single document, zero clicks, that is
searchable. PDF and plain text are the only thing that give me that. If
you are really zealous you can even use Beagle (which I don't) to
preindex the PDF for you for easy searching.


Lots of projects publish their HTML docs in two formats: One Big HTML 
file with everything; Broken up into many HTML files that link to each 
other.  This would allow you you have one big searchable document.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Documentation access problems.

2007-03-23 Thread Matthew T. O'Connor


Bruce Momjian wrote:

Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:
Lots of projects publish their HTML docs in two formats: One Big HTML 
file with everything; Broken up into many HTML files that link to each 
other.  This would allow you you have one big searchable document.

The key word there being big ;-) ... I don't have any problem with
making such a version available on the website, but I don't think
shipping two versions of the HTML docs in our tarballs is reasonable.


I think having the single HTML file version available on our web site is
enough.


Agreed.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] autovacuum next steps, take 3

2007-03-09 Thread Matthew T. O'Connor

My initial reaction is that this looks good to me, but still a few 
comments below.


Alvaro Herrera wrote:

Here is a low-level, very detailed description of the implementation of
the autovacuum ideas we have so far.

launcher's dealing with databases
-


[ Snip ]


launcher and worker interactions


[Snip]


worker to-do list
-
When each worker starts, it determines which tables to process in the
usual fashion: get pg_autovacuum and pgstat data and compute the
equations.

The worker then takes a snapshot of what's currently going on in the
database, by storing worker PIDs, the corresponding table OID that's
being currently worked, and the to-do list for each worker.


Does a new worker really care about the PID of other workers or what 
table they are currently working on?



It removes from its to-do list the tables being processed.  Finally, it
writes the list to disk.


Just to be clear, the new worker removes from it's todo list all the 
tables mentioned in the todo lists of all the other workers?



The table list will be written to a file in
PGDATA/vacuum/database-oid/todo.worker-pid
The file will consist of table OIDs, in the order in which they are
going to be vacuumed.

At this point, vacuuming can begin.


This all sounds good to me so far.


Before processing each table, it scans the WorkerInfos to see if there's
a new worker, in which case it reads its to-do list to memory.


It's not clear to me why a worker cares that there is a new worker, 
since the new worker is going to ignore all the tables that are already 
claimed by all worker todo lists.



Then it again fetches the tables being processed by other workers in the
same database, and for each other worker, removes from its own in-memory
to-do all those tables mentioned in the other lists that appear earlier
than the current table being processed (inclusive).  Then it picks the
next non-removed table in the list.  All of this must be done with the
Autovacuum LWLock grabbed in exclusive mode, so that no other worker can
pick the same table (no IO takes places here, because the whole lists
were saved in memory at the start.)


Again it's not clear to me what this is gaining us?  It seems to me that 
if when a worker starts up writes out it's to-do list, it should just do 
it, I don't see the value in workers constantly updating their todo 
lists.  Maybe I'm just missing something can you enlighten me?



other things to consider


This proposal doesn't deal with the hot tables stuff at all, but that is
very easy to bolt on later: just change the first phase, where the
initial to-do list is determined, to exclude cold tables.  That way,
the vacuuming will be fast.  Determining what is a cold table is still
an exercise to the reader ...


I think we can make this algorithm naturally favor small / hot tables 
with one small change.  Having workers remove tables that they just 
vacuumed from their to-do lists and re-write their todo lists to disk. 
Assuming the todo lists are ordered by size ascending, smaller tables 
will be made available for inspection by newer workers sooner rather 
than later.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] autovacuum next steps, take 3

2007-03-09 Thread Matthew T. O'Connor


Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:
It's not clear to me why a worker cares that there is a new worker, 
since the new worker is going to ignore all the tables that are already 
claimed by all worker todo lists.


That seems wrong to me, since it means that new workers will ignore
tables that are scheduled for processing by an existing worker, no
matter how far in the future that schedule extends.  As an example,
suppose you have half a dozen large tables in need of vacuuming.
The first worker in will queue them all up, and subsequent workers
will do nothing useful, at least not till the first worker is done
with the first table.  Having the first worker update its todo
list file after each table allows the earlier tables to be exposed
for reconsideration, but that's expensive and it does nothing for
later tables.


Well the big problem that we have is not that large tables are being 
starved, so this doesn't bother me too much, plus there is only so much 
IO, so one worker working sequentially through the big tables seems OK 
to me.



I suggest that maybe we don't need exposed TODO lists at all.  Rather
the workers could have internal TODO lists that are priority-sorted
in some way, and expose only their current table OID in shared memory.
Then the algorithm for processing each table in your list is

1. Grab the AutovacSchedule LWLock exclusively.
2. Check to see if another worker is currently processing
   that table; if so drop LWLock and go to next list entry.
3. Recompute whether table needs vacuuming; if not,
   drop LWLock and go to next entry.  (This test covers the
   case where someone vacuumed the table since you made your
   list.)
4. Put table OID into shared memory, drop LWLock, then
   vacuum table.
5. Clear current-table OID from shared memory, then
   repeat for next list entry.

This creates a behavior of whoever gets to it first rather than
allowing workers to claim tables that they actually won't be able
to service any time soon.


Right, but you could wind up with as many workers working concurrently 
as you have tables in a database which doesn't seem like a good idea 
either.  One thing I like about the todo list setup Alvaro had is that 
new workers will be assigned fewer tables to work on and hence exit 
sooner.  We are going to fire off a new worker every autovac_naptime so 
availability of new workers isn't going to be a problem.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] autovacuum next steps, take 2

2007-02-27 Thread Matthew T. O'Connor


Jim C. Nasby wrote:

On Tue, Feb 27, 2007 at 12:00:41AM -0300, Alvaro Herrera wrote:

Jim C. Nasby wrote:


The advantage to keying this to autovac_naptime is that it means we
don't need another GUC, but after I suggested that before I realized
that's probably not the best idea. For example, I've seen clusters that
are running dozens-hundreds of databases; in that environment you really
need to turn naptime way down (to like a second). In that case you
wouldn't want to key to naptime.

Actually, I've been thinking that it would be a good idea to change the
semantics of autovacuum_naptime so that it means the average time to
start a worker in any given database.  That way, the time between
autovac runs is not dependent on the number of databases you have.


BTW, another issue that I don't think we can ignore: we actually need to
do this on a per-tablespace level, or at least have the ability to
disable or somehow limit it. While it's not common, there are users that
run a hundred or more databases in a single cluster; it would be ugly if
we suddenly had 100 vacuums trying to run on the same set of drives
concurrently.


I think we all agree that autovacuum needs to become tablespace aware at 
some point, but I think that is further down the line, we're having 
enough trouble figuring things out without that additional complication.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-27 Thread Matthew T. O'Connor


Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

On Tue, 2007-02-27 at 10:37 -0600, Jim C. Nasby wrote:

... The idea would be to give vacuum a target run time, and it
would monitor how much time it had remaining, taking into account how
long it should take to scan the indexes based on how long it's been
taking to scan the heap. When the amount of time left becomes less than
the estimate of the amount of time required to scan the indexes (and
clean the heap), you stop the heap scan and start scanning indexes.



I do like this idea, but it also seems easy to calculate that bit
yourself. Run VACUUM, after X minutes issue stop_vacuum() and see how
long it takes to finish. Adjust X until you have it right.


One problem with it is that a too-small target would result in vacuum
proceeding to scan indexes after having accumulated only a few dead
tuples, resulting in increases (potentially enormous ones) in the total
work needed to vacuum the table completely.

I think it's sufficient to have two cases: abort now, and restart from
the last cycle-completion point next time (this would basically just be
SIGINT); or set a flag to stop at the next cycle-completion point.


It occurs to me that we may be thinking about this the wrong way
entirely.  Perhaps a more useful answer to the problem of using a
defined maintenance window is to allow VACUUM to respond to changes in
the vacuum cost delay settings on-the-fly.  So when your window closes,
you don't abandon your work so far, you just throttle your I/O rate back
to whatever's considered acceptable for daytime vacuuming.


I thought we already did that?  Which BTW was part of my plan on how to 
deal with a vacuum that is still running after it's maintenance window 
has expired.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Jim C. Nasby wrote:


That's why I'm thinking it would be best to keep the maximum size of
stuff for the second worker small. It probably also makes sense to tie
it to time and not size, since the key factor is that you want it to hit
the high-update tables every X number of seconds.

If we wanted to get fancy, we could factor in how far over the vacuum
threshold a table is, so even if the table is on the larger size, if
it's way over the threshold the second vacuum will hit it.


Ok, I think we may be actually getting somewhere.


Me too.


I propose to have two different algorithms for choosing the tables to
work on.  The worker would behave differently, depending on whether
there is one or more workers on the database already or not.

The first algorithm is the plain threshold equation stuff we use today.
If a worker connects and determines that no other worker is in the
database, it uses the plain worker mode.  A worker in this mode would
examine pgstats, determine what tables to vacuum/analyze, sort them by
size (smaller to larger), and goes about its work.  This kind of worker
can take a long time to vacuum the whole database -- we don't impose any
time limit or table size limit to what it can do.


Right, I like this.


The second mode is the hot table worker mode, enabled when the worker
detects that there's already a worker in the database.  In this mode,
the worker is limited to those tables that can be vacuumed in less than
autovacuum_naptime, so large tables are not considered.  Because of
this, it'll generally not compete with the first mode above -- the
tables in plain worker were sorted by size, so the small tables were
among the first vacuumed by the plain worker.  The estimated time to
vacuum may be calculated according to autovacuum_vacuum_delay settings,
assuming that all pages constitute cache misses.


How can you determine what tables can be vacuumed within 
autovacuum_naptime?  I agree that large tables should be excluded, but I 
don't know how we can do that calculation based on autovacuum_naptime.


So at:
t=0*autovacuume_naptime: worker1 gets started on DBX
t=1*autovacuume_naptime: worker2 gets started on DBX
worker2 determines all tables that need to be vacuumed,
worker2 excludes tables that are too big from it's to-do list,
worker2 gets started working,
worker2 exits when it either:
a) Finishes it's entire to-do-list.
b) Catches up to worker1

I think the questions are 1) What is the exact math you are planning on 
using to determine which tables are too big?  2) Do we want worker2 to 
exit when it catches worker1 or does the fact that we have excluded 
tables that re too big mean that we don't have to worry about this?



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
How can you determine what tables can be vacuumed within 
autovacuum_naptime?


My assumption is that
pg_class.relpages * vacuum_cost_page_miss * vacuum_cost_delay = time to vacuum

This is of course not the reality, because the delay is not how long
it takes to fetch the pages.  But it lets us have a value with which we
can do something.  With the default values, vacuum_cost_delay=10,
vacuum_cost_page_miss=10, autovacuum_naptime=60s, we'll consider tables
of under 600 pages, 4800 kB (should we include indexes here in the
relpages count?  My guess is no).


I'm not sure how pg_class.relpages is maintained but what happens to a 
bloated table?  For example, a 100 row table that is constantly updated 
and hasn't been vacuumed in a while (say the admin disabled autovacuum 
for a while), now that small 100 row table has 1000 pages in it most of 
which are just bloat, will we miss this table?  Perhaps basing this on 
reltuples would be better?



A table over 600 pages does not sound like a good candidate for hot, so
this seems more or less reasonable to me.  On the other hand, maybe we
shouldn't tie this to the vacuum cost delay stuff.


I'm not sure it's a good idea to tie this to the vacuum cost delay 
settings either, so let me as you this, how is this better than just 
allowing the admin to set a new GUC variable like 
autovacuum_hot_table_size_threshold  (or something shorter) which we can 
assign a decent default of say 8MB.


Thoughts?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
I'm not sure how pg_class.relpages is maintained but what happens to a 
bloated table?  For example, a 100 row table that is constantly updated 
and hasn't been vacuumed in a while (say the admin disabled autovacuum 
for a while), now that small 100 row table has 1000 pages in it most of 
which are just bloat, will we miss this table?  Perhaps basing this on 
reltuples would be better?


Well, this would only happen the first time, until the plain worker
processed the table; next time it would be picked up by the hot table
worker.  But yeah, we can build a better estimate using the same trick
the planner uses: estimate tuple density as reltuples/relpages times the
actual number of blocks on disk.


Well even skipping it the first time isn't good, anything that further 
delays a hot table from getting vacuumed is bad.  Also, I'm not sure it 
would just be the first time since plain VACUUM isn't going to reclaim 
most of the space, just mark it as reusable.  This is moot however if we 
use a good metric, I have no idea if what you suggest above would be 
good enough.



A table over 600 pages does not sound like a good candidate for hot, so
this seems more or less reasonable to me.  On the other hand, maybe we
shouldn't tie this to the vacuum cost delay stuff.
I'm not sure it's a good idea to tie this to the vacuum cost delay 
settings either, so let me as you this, how is this better than just 
allowing the admin to set a new GUC variable like 
autovacuum_hot_table_size_threshold  (or something shorter) which we can 
assign a decent default of say 8MB.


Yeah, maybe that's better -- it's certainly simpler.


Simple is better, at least until proven otherwise.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:

Matthew T. O'Connor wrote:
I'm not sure it's a good idea to tie this to the vacuum cost delay 
settings either, so let me as you this, how is this better than just 
allowing the admin to set a new GUC variable like 
autovacuum_hot_table_size_threshold  (or something shorter) which we can 
assign a decent default of say 8MB.



Yeah, maybe that's better -- it's certainly simpler.


I'm not liking any of these very much, as they seem critically dependent
on impossible-to-tune parameters.  I think it'd be better to design this
around having the first worker explicitly expose its state (list of
tables to process, in order) and having subsequent workers key off that
info.  The shared memory state could include the OID of the table each
worker is currently working on, and we could keep the to-do list in some
simple flat file for instance (since we don't care about crash safety).


So far we are only talking about one parameter, the 
hot_table_size_threshold, which I agree would be a guess by an admin, 
but if we went in this direction, I would also advocate adding a column 
to the pg_autovacuum table that allows an admin to explicitly define a 
table as hot or not.


Also I think each worker should be mostly independent, the only caveat 
being that (assuming each worker works in size order) if we catch up to 
an older worker (get to the table they are currently working on) we 
exit.  Personally I think this is all we need, but others felt the 
additional threshold was needed.  What do you think?  Or what do you 
think might be better?



I'm not certain exactly what key off needs to mean; perhaps each
worker should make its own to-do list and then discard items that are
either in-progress or recently done by another worker when it gets to
them.


My initial design didn't have any threshold at all, but others felt this 
would/could result in too many worker working concurrently in the same DB.



I think an absolute minimum requirement for a sane design is that no two
workers ever try to vacuum the same table concurrently, and I don't see
where that behavior will emerge from your proposal; whereas it's fairly
easy to make it happen if non-first workers pay attention to what other
workers are doing.


Maybe we never made that clear, I was always working on the assumption 
that two workers would never try to work on the same table at the same time.



BTW, it's probably necessary to treat shared catalogs specially ...


Certainly.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Jim C. Nasby wrote:

On Mon, Feb 26, 2007 at 06:23:22PM -0500, Matthew T. O'Connor wrote:
I'm not sure how pg_class.relpages is maintained but what happens to a 
bloated table?  For example, a 100 row table that is constantly updated 
and hasn't been vacuumed in a while (say the admin disabled autovacuum 
for a while), now that small 100 row table has 1000 pages in it most of 
which are just bloat, will we miss this table?  Perhaps basing this on 
reltuples would be better?


The entire point of this is to ensure that the second daemon will only
vacuum tables that it can finish very quickly. If you let a table bloat
so it's too big, then you just can't vacuum it very frequently without
risking all your other hot tables bloating because they're no longer
getting vacuumed.

The reality is that you can actually vacuum a pretty good-sized table in
60 seconds with typical cost-delay settings (ie: defaults except
cost_delay set to 10). That means you can do 9 pages ~100 times a
second, or 54k pages a minute. Even with a vacuum_cost_delay of 20,
that's still 27k pages per minute.


At the risk of sounding like a broken record, I still think the size 
limit threshold is unnecessary.  Since all workers will be working in on 
tables in size order, younger workers will typically catch older workers 
fairly quickly since the tables will be either small, or recently 
vacuumed and not need work.  And since younger workers exit when they 
catch-up to an older worker, there is some inherent stability in the 
number of workers.


Here is a worst case example: A DB with 6 tables all of which are highly 
active and will need to be vacuumed constantly.  While this is totally 
hypothetical, it is how I envision things working (without the threshold).


table1:10 rows
table2:   100 rows
table3: 1,000 rows
table4:10,000 rows
table5:   100,000 rows
table6: 1,000,000 rows

time=0*naptime: No workers in the DB

time=1*naptime: worker1 starts on table1

time=2*naptime: worker1 has finished table1,table2 and table3, it's now 
working on table4, worker2 starts on table1.


time=3*naptime: worker1 is on table5, worker2 is working on table4, 
worker3 starts on table1.


time=4*naptime: worker1 is still on table5, worker2 has caught up to 
worker1 and exits, worker3 also catches up to worker1 since tables2-4 
didn't require vacuum at this time so it exits, worker4 starts on table1


time=5*naptime: worker1 is working on table6, worker4 is up to table4, 
worker5 starts on table1


time=6*naptime: worker1 is working on table6, worker4 catches up to 
worker1 and exits, worker5 finds no additional work to be done and 
exits, worker6 starts at table1.


time=7*naptime: worker1 still working on table6, worker6 is up to 
table4, worker7 starts at table1.


time=8*naptime: worker1 still working on table6, worker6 still working 
on table4, worker7 working on table3, worker8 starting on table1.


time=9*naptime: worker1 still working on table6, worker6 working on 
table5, worker7 catches worker 6 and exits, worker8 finds nothing more 
todo and exits, worker9 starts on table1


time=10*naptim: worker1 still working on table6, worker9 working on 
table4, worker10 starts on table1.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

The real problem is trying to set that up in such a fashion that keeps
hot tables frequently vacuumed;


Are we assuming that no single worker instance will vacuum a given table
more than once?  (That's not a necessary assumption, certainly, but
without it there are so many degrees of freedom that I'm not sure how
it should act.)  Given that assumption, the maximum vacuuming rate for
any table is once per autovacuum_naptime, and most of the magic lies in
the launcher's algorithm for deciding which databases to launch workers
into.


Yes, I have been working under the assumption that a worker goes through 
the list of tables once and exits, and yes the maximum vacuuming rate 
for any table would be once per autovacuum_naptime.  We can lower the 
default if necessary, as far as I'm concerned it's (or should be) fairly 
cheap to fire off a worker and have it find that there isn't anything 
todo and exit.



I'm inclined to propose an even simpler algorithm in which every worker
acts alike; its behavior is
1. On startup, generate a to-do list of tables to process, sorted in
priority order.
2. For each table in the list, if the table is still around and has not
been vacuumed by someone else since you started (including the case of
a vacuum-in-progress), then vacuum it.


That is what I'm proposing except for one difference, when you catch up 
to an older worker, exit.  This has the benefit reducing the number of 
workers concurrently working on big tables, which I think is a good thing.



Detecting already vacuumed since you started is a bit tricky; you
can't really rely on the stats collector since its info isn't very
up-to-date.  That's why I was thinking of exposing the to-do lists
explicitly; comparing those with an advertised current-table would
allow accurate determination of what had just gotten done.


Sounds good, but I have very little insight into how we would implement 
already vacuumed since you started or have I caught up to another 
worker.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Tom Lane wrote:

BTW, to what extent might this whole problem be simplified if we adopt
chunk-at-a-time vacuuming (compare current discussion with Galy Lee)?
If the unit of work has a reasonable upper bound regardless of table
size, maybe the problem of big tables starving small ones goes away.


So if we adopted chunk-at-a-time then perhaps each worker processes the 
list of tables in OID order (or some unique and stable order) and does 
one chunk per table that needs vacuuming.  This way an equal amount of 
bandwidth is given to all tables.


That does sounds simpler. Is chunk-at-a-time a realistic option for 8.3?


Matt


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:

Tom Lane wrote:

I'm inclined to propose an even simpler algorithm in which every worker
acts alike;


That is what I'm proposing except for one difference, when you catch up 
to an older worker, exit.


No, that's a bad idea, because it means that any large table starves
even-larger tables.


True, but the assumption I'm making is that there is a finite amount of 
bandwidth available and more concurrent activity will have a net 
negative effect the time it takes to vacuum all tables.  I'm willing to 
pay that price to prevent small hot tables from getting starved, but 
less willing to pay the same price for large tables where the percentage 
of bloat will be much smaller.



(Note: in all this I assume we're all using size as a shorthand for
some sort of priority metric that considers number of dirty tuples not
only size.  We don't want every worker insisting on passing over every
small read-only table every time, for instance.)


I was using size to mean reltuples.  The whole concept of sorting by 
size was designed to ensure that smaller (more susceptible to bloat) 
tables got priority.  It might be useful for workers to sort their to-do 
lists by some other metric, but I don't have a clear vision of what that 
might be.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:

That does sounds simpler. Is chunk-at-a-time a realistic option for 8.3?


It seems fairly trivial to me to have a scheme where you do one
fill-workmem-and-scan-indexes cycle per invocation, and store the
next-heap-page-to-scan in some handy place (new pg_class column updated
along with relpages/reltuples, likely).  Galy is off in left field with
some far more complex ideas :-( but I don't see that there's all that
much needed to support this behavior ... especially if we don't expose
it to the SQL level but only support it for autovac's use.  Then we're
not making any big commitment to support the behavior forever.


Well, if we can make it happen soon, it might be the best thing for 
autovacuum.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Jim C. Nasby wrote:

On Mon, Feb 26, 2007 at 10:18:36PM -0500, Matthew T. O'Connor wrote:

Jim C. Nasby wrote:
Here is a worst case example: A DB with 6 tables all of which are highly 
active and will need to be vacuumed constantly.  While this is totally 
hypothetical, it is how I envision things working (without the threshold).


I fail to see how a simple 6 table case is 'worst case'. It's common to
see hundreds of tables, and I've run across more than one database with
thousands of tables (think partitioning). In cases like those it's
certainly possible, perhaps even likely that you would get many daemons
running in the database at one time just from different tables suddenly
needing vacuuming and appearing at a higher point in the list than other
tables. With 100 ~1G tables getting updates it certainly wouldn't be
hard to end up with 10 of those being vacuumed all at the same time.


Yes 6 tables is small, the worst-case part of the example was that all 
the tables would need to be vacuumed constantly.  Most databases only 
have a few hot tables.  Most tables only need to vacuumed every once in 
a while.



I do like the idea since it should be easier to tune, but I think we
still need some limit on it. Perhaps as a first-pass we could just have
a hard limit and log a message and/or set a flag any time we hit it.
That would hopefully allow us to get information about how big a problem
it really is. We could go one step further and say that the last daemon
that can start in a database will only vacuum tables that can be done
quickly; that's essentially what we've been talking about, except the
limit we've been discussing would be hard-coded at 2.


I'm confused, what limit would be set at 2?  The number of concurrent 
workers?  I've never said that.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

The proposal to save enough state to be able to resume a vacuum at
pretty much any point in it's cycle might work; we'd have to benchmark
it.  With the default maintenance_work_mem of 128M it would mean writing
out 64M of state every minute on average, which is likely to take
several seconds to fsync (though, maybe we wouldn't need to fsync it...)


Which is exactly why we needn't bother benchmarking it.  Even if it
weren't complex and unsafe, it will be a net loss when you consider the
fact that it adds I/O instead of removing it.


I'm not sure what you are saying here, are you now saying that partial 
vacuum won't work for autovac?  Or are you saying that saving state as 
Jim is describing above won't work?


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor


Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:
I'm not sure what you are saying here, are you now saying that partial 
vacuum won't work for autovac?  Or are you saying that saving state as 
Jim is describing above won't work?


I'm saying that I don't like the idea of trying to stop on a dime by
saving the current contents of vacuum's dead-TID array to disk with the
idea that we can trust those values 100% later.  Saving the array is
expensive both in runtime and code complexity, and I don't believe we
can trust it later --- at least not without even more expensive-and-
complex measures, such as WAL-logging every such save :-(

I'm for stopping only after completing an index-cleaning pass, at the
point where we empty the dead-TID array anyway.  If you really have to
have stop on a dime, just kill -INT the process, accepting that you
will have to redo your heap scan since the last restart point.


OK, so if I understand correct, a vacuum of a table with 10 indexes on 
it can be interrupted 10 times, once after each index-cleaning pass? 
That might have some value, especially breaking up the work required to 
vacuum a large table. Or am I still not getting it?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] [Monotone-devel] Re: SCMS question

2007-02-25 Thread Matthew D. Fuller

On Sat, Feb 24, 2007 at 10:27:38PM -0500 I heard the voice of
Andrew Dunstan, and lo! it spake thus:
 
 This decision really belongs to the handful of people who do most of
 the maintenance and live with most of any CVS pain that exists: such
 as Tom, Bruce, Peter, Neil, Alvaro. Othe people have a right to
 voice an opinion, but nobody should be pushing on it.

One thing that the DVCS crowd pushes is that that's _not_ the whole
story.  With CVS (or other centralized systems), the VCS is a
development tool for the few core people, and a glorified
FTP/snapshotting system for everyone else.  With a DVCS, _everybody_
gets a development tool out of it.


ObBias: After much resistance, I drank the distributed Kool-Aid.  My
poison of choice is bzr, which is very probably not ready
performance-wise for Pg.  So, I also look forward to a switch
happening not now, but in a year or two, when the performance failings
are historical and bzr can be chosen   8-}


-- 
Matthew Fuller (MF4839)   |  [EMAIL PROTECTED]
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [Monotone-devel] Re: SCMS question

2007-02-25 Thread Matthew D. Fuller

On Sun, Feb 25, 2007 at 06:28:20PM -0500 I heard the voice of
Andrew Dunstan, and lo! it spake thus:
 
 I don't really drink this koolaid, at least not to the extent of
 disavowing what I said above.

Oh, don't take my message as You're wrong, you're not taking into
account [...].  It was meant more as a This is a convenient place to
make [...] explicit.


It seems that there are really 3 sequential questions here.


1) Do we switch VCS's?

   The averaged answer to this is pretty much Probably, but not right
   now, and not in the very near future.  Given that, the rest of the
   discussion is probably somewhat pointless; at the least it should
   be carried out with this answer kept firmly in mind.

2) Do we go the DVCS route?, and only after THAT is resolved do we go
   on to:

3) Which VCS?


The feature/capability lists of the various DVCS's contain a mix of
those features which are inherent in (or at least pretty much
universal among) DVCS's as a class, and those which are more
particular to the given system.  But in a discussion of which VCS to
(hypothetically) use, you really want to separate them out so you can
know when you're arguing for/against $SYSTEM, and when you're arguing
for/against $CLASS_OF_SYSTEMS.



-- 
Matthew Fuller (MF4839)   |  [EMAIL PROTECTED]
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] SCMS question

2007-02-25 Thread Matthew D. Fuller

On Sun, Feb 25, 2007 at 06:06:57PM -0500 I heard the voice of
Neil Conway, and lo! it spake thus:
 
 The ability to do history-sensitive merges actually results in a
 significant reduction in the need for manual conflict resolution.

I would say that a far greater contributor in practice would simply be
frequency.  If you diverge on your significant feature for 6 months,
then try to merge in upstream changes from the main dev, you will be
in hell no matter what merge algorithm you use.  If you merge in
upstream changes every few days, however, you will have many fewer and
much simplier conflicts to deal with.

A VCS that makes frequent merges easy results in easier conflict
handling, not by some magical auto-resolution, but just by letting you
do it in ongoing regular and small bites.


-- 
Matthew Fuller (MF4839)   |  [EMAIL PROTECTED]
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] autovacuum next steps, take 2

2007-02-22 Thread Matthew T. O'Connor


Jim C. Nasby wrote:

On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:
  
My Proposal:  If we require admins to identify hot tables tables, then: 
1) Launcher fires-off a worker1 into database X.

2) worker1 deals with hot tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime. 
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as 
worker1 did above.  If worker1 is still working in DB X then worker2 
looks for hot tables that are being starved because worker1 got busy. 
If worker2 finds no hot tables that need work, then worker2 exits.



Rather than required people to manually identify hot tables, what if we
just prioritize based on table size? So if a second autovac process hits
a specific database, it would find the smallest table in need of
vacuuming that it should be able to complete before the next naptime and
vacuum that. It could even continue picking tables until it can't find
one that it could finish within the naptime. Granted, it would have to
make some assumptions about how many pages it would dirty.

ISTM that's a lot easier than forcing admins to mark specific tables.


So the heuristic would be:
* Launcher fires off workers into a database at a given interval 
(perhaps configurable?)
* Each worker works on tables in size order. 
* If a worker ever catches up to an older worker, then the younger 
worker exits.


This sounds simple and workable to me, perhaps we can later modify this 
to include some max_workers variable so that a worker would only exit if 
it catches an older worker and there are max_workers currently active.



Thoughts?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] autovacuum next steps, take 2

2007-02-22 Thread Matthew T. O'Connor


Jim C. Nasby wrote:

On Thu, Feb 22, 2007 at 09:32:57AM -0500, Matthew T. O'Connor wrote:
  

So the heuristic would be:
* Launcher fires off workers into a database at a given interval 
(perhaps configurable?)
* Each worker works on tables in size order. 
* If a worker ever catches up to an older worker, then the younger 
worker exits.


This sounds simple and workable to me, perhaps we can later modify this 
to include some max_workers variable so that a worker would only exit if 
it catches an older worker and there are max_workers currently active.



That would likely result in a number of workers running in one database,
unless you limited how many workers per database. And if you did that,
you wouldn't be addressing the frequently update table problem.

A second vacuum in a database *must* exit after a fairly short time so
that we can go back in and vacuum the important tables again (well or
the 2nd vacuum has to periodically re-evaluate what tables need to be
vacuumed).
  


I'm not sure this is a great idea, but I don't see how this would result 
in large numbers of workers working in one database.   If workers work 
on tables in size order, and exit as soon as they catch up to an older 
worker, I don't see the problem.  Newer works are going to catch-up to 
older workers pretty quickly since small tables will vacuum fairly quickly.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] autovacuum next steps, take 2

2007-02-21 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Ok, scratch that :-)  Another round of braindumping below.


I still think this is solution in search of a problem.  The main problem 
we have right now is that hot tables can be starved from vacuum.  Most 
of this proposal doesn't touch that.  I would like to see that problem 
solved first, then we can talk about adding multiple workers per 
database or per tablespace etc...



(This idea can be complemented by having another GUC var,
autovacuum_hot_workers, which allows the DBA to have more than one
worker on hot tables (just for the case where there are too many hot
tables).  This may be overkill.)


I think this is more along the lines of what we need first.


Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc.  He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is implemented.  So this take 2 could be implemented, and
then we could switch the cost_delay stuff to using a MB/s kind of
measurement somehow (he says waving his hands wildly).


Agree this is probably a good idea in the long run, but I agree this is 
lower on the priority list and should come next.



Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace.  I say we're not :-)  If we
consider Ron Mayer's idea of measuring MB/s, but we do it per
tablespace, then we would inflict the correct amount of vacuum pain to
each tablespace, sleeping as appropriate.  I think this would require
workers of different databases to communicate what tablespaces they are
using, so that all of them can utilize the correct amount of bandwidth.


I agree that in the long run it might be better to have multiple workers 
with MB/s throttle and tablespace aware, but we don't have any of that 
infrastructure right now.  I think the piece of low-hanging fruit that 
your launcher concept can solve is the hot table starvation.


My Proposal:  If we require admins to identify hot tables tables, then: 
1) Launcher fires-off a worker1 into database X.

2) worker1 deals with hot tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime. 
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as 
worker1 did above.  If worker1 is still working in DB X then worker2 
looks for hot tables that are being starved because worker1 got busy. 
If worker2 finds no hot tables that need work, then worker2 exits.


This seems a very simple solution (given your launcher work) that can 
solve the starvation problem.



Thoughts?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] GiST Comparing IndexTuples/Datums

2007-02-17 Thread Matthew Campbell


Good news:

   I think we've got GiST working (somewhat anyways), as we found
gistKeyIsEQ(giststate,
0, datum, currdatum) in gistutil.c does the trick of comparing two datums.
I swear most of our trouble is just finding our way around the postgres
codebase, but we're getting there little by little.  We've gone back to
revisit hash to see if we can figure it out now that we understand a little
bit about GiST, but we can't find an equivelent function in hash for the
KeyIsEQ().
   So two questions really.  The first is if such a function exists for
hash.  The second is that nbtree and Neil Conways work a few years ago (
http://archives.postgresql.org/pgsql-patches/2003-09/msg00252.php) use the
scan and scankey stuff, but we're having trouble understanding how these
work.  Is there some documentation on using these correctly (outside of just
looking at nbtree code)?  Thanks so much for the help folks!


-Matt

On 2/13/07, Teodor Sigaev [EMAIL PROTECTED] wrote:


 indexes, then it must use operator number so-and-so for equality.  But
 there are lots of GiST opclasses that don't include equality at all; we
 can't break that case.

There is a GiST support function for equality of keys, in btree_gist it's
named
as gbt_*_same. Equality function has support number 7 and is used for
stored keys.

But the real issue in unique GiST index is unique :). First, the algorithm
of
insertion doesn't compare indexed keys on leaf page at all. Values on the
same
page are compared only when page is splitting (picksplit support method).
Second, GiST implementation supports only unordered trees (btree_gist is a
some
kind of emulation) and it cannot guarantee that equal keys will be close
in
index. That's related to picksplit and gistpenalty method
problem/optimization
and data set.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
WWW:
http://www.sigaev.ru/

Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:


[Snip Detailed Description]


How does this sound?


On first blush, I'm not sure I like this as it doesn't directly attack 
the table starvation problem, and I think it could be a net loss of speed.


VACUUM is I/O bound, as such, just sending multiple vacuum commands at a 
DB isn't going to make things faster, you are now going to have multiple 
processes reading from multiple tables at the same time.  I think in 
general this is a bad thing (unless we someday account for I/O made 
available from multiple tablespaces).  In general the only time it's a 
good idea to have multiple vacuums running at the same time is when a 
big table is starving a small hot table and causing bloat.


I think we can extend the current autovacuum stats to add one more 
column that specifies is hot or something to that effect.  Then when 
the AV launcher sends a worker to a DB, it will first look for tables 
marked as hot and work on them.  While working on hot tables, the 
launcher need not send any additional workers to this database, if the 
launcher notices that a worker is working on regular tables, it can send 
another worker which will look for hot tables to working, if the worker 
doesn't find any hot tables that need work, then it exits leaving the 
original working to continue plodding along.


Thoughts?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
On first blush, I'm not sure I like this as it doesn't directly attack 
the table starvation problem, and I think it could be a net loss of speed.


VACUUM is I/O bound, as such, just sending multiple vacuum commands at a 
DB isn't going to make things faster, you are now going to have multiple 
processes reading from multiple tables at the same time.  I think in 
general this is a bad thing (unless we someday account for I/O made 
available from multiple tablespaces).


Yeah, I understand that.  However, I think that can be remedied by using
a reasonable autovacuum_vacuum_cost_delay setting, so that each worker
uses less than the total I/O available.  The main point of the proposal
is to allow multiple workers on a DB while also allowing multiple
databases to be processed in parallel.


So you are telling people to choose an autovacuum_delay so high that 
they need to run multiple autovacuums at once to keep up?  I'm probably 
being to dramatic, but it seems inconsistent.


I think we can extend the current autovacuum stats to add one more 
column that specifies is hot or something to that effect.  Then when 
the AV launcher sends a worker to a DB, it will first look for tables 
marked as hot and work on them.  While working on hot tables, the 
launcher need not send any additional workers to this database, if the 
launcher notices that a worker is working on regular tables, it can send 
another worker which will look for hot tables to working, if the worker 
doesn't find any hot tables that need work, then it exits leaving the 
original working to continue plodding along.


How would you define what's a hot table?


I wasn't clear, I would have the Admin specified it, and we can store it 
as an additional column in the pg_autovacuum_settings table.  Or perhaps 
if the table is below some size threshold and autovacuum seems that it 
needs to be vacuumed every time it checks it 10 times in a row or 
something like that.	


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] GiST Comparing IndexTuples/Datums

2007-02-12 Thread Matthew Campbell


Hey folks:

   I posted this to the pgsql-novice mailing list but was told that it'd
probably be better to repost here.  I've been working with a group trying to
implement UNIQUE index functionality in GiST (we started with hash, and have
branched out to try and understand some of the other indexing structures as
well).  We installed the btree-gist contrib module and have gist indexes
working fine and understanding when it's supposed to be unique (regardless
of which contrib module is installed).  We currently can walk over all the
IndexTuples in a page and we were hoping to compare the IndexTuple passed
into the gistdoinsert() function (the new itup being added to the index) to
each of the IndexTuples in said page.  The idea we've been playing with
right now is using 'datum = index_getattr(itup, 1, RelationGetDescr(r),
isnull);' as its used in hashinsert.c, but we can't figure out how to
compare the two datums then.  The actual data type of the value being
inserted would be different depending on the type of column you created the
index on.  Since you provide an opclass when creating a gist index, are we
supposed to use one of the user defined functions to compare items?  (One of
them is 'same', so in btree-gist, the function is gbt_text_same() for
indexes on char and text columns)  Is there a general way to access these
functions without knowing specifically what contrib module is specified,
something like OpClassFunction7()?  Thanks so much for humoring me, and I
apologize as I get myself familiar with PostgreSQL and database concepts in
general.


-Matt

Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

This is how I think autovacuum should change with an eye towards being
able to run multiple vacuums simultaneously:


[snip details]


Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, so that scheduling will continue
to be as it is today.  Thus the scheduling discussions are being
deferred until they can be actually useful and implementable.


I can't really speak to the PostgreSQL signaling innards, but this sound 
logical to me.  I think having the worker processes be children of the 
postmaster and having them be single-minded (or single-tasked) also 
makes a lot of sense.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-16 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc.  I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).


I'm not familiar with his proposal, but I'm not sure what I think of 
logging vacuum (and perhaps analyze) commands to a table.  We have never 
logged anything to tables inside PG.  I would be worried about this 
eating a lot of space in some situations.


I think most people would just be happy if we could get autovacuum to 
log it's actions at a much higher log level.  I think that autovacuum 
vacuumed table x is important and shouldn't be all the way down at the 
debug level.


The other (more involved) solution to this problem was proposed which 
was create a separate set of logging control params for autovacuum so 
that you can turn it up or down independent of the general server logging.



Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
that manually-run vacuums would be logged as well.


+1


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Autovacuum improvements

2007-01-16 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
  
This still seems ambiguous to me, how would I handle a maintenance 
window of Weekends from Friday at 8PM though Monday morning at 6AM? My 
guess from what said is:

mon dom dow starttime endtime
null  null6  20:00  null
null  null1  null  06:00

So how do we know to vacuum on Saturday or Sunday?  I think clearly 
defined intervals with explicit start and stop times is cleaner.



mon dom dow start   end
nullnull5   20:00   23:59:59
nullnull6   00:00   23:59:59
nullnull7   00:00   23:59:59
nullnull1   00:00   06:00

(1 = monday, 5 = friday)
  


So it takes 4 lines to handle one logical interval, I don't really like 
that.  I know that your concept of interval groups will help mask this 
but still.



Now I'm starting to wonder what will happen between 23:59:59 of day X
and 00:00:00 of day (X+1) ...  Maybe what we should do is not specify
an end time, but a duration as an interval:

month   int
dom int
dow int
start   time
durationinterval

That way you can specify the above as
mon dom dow start   duration
nullnull5   20:00   (4 hours + 2 days + 6 hours)

Now, if a DST boundary happens to fall in that interval you'll be an
hour short, or it'll last an hour too long :-)
  


I certainly like this better than the first proposal, but I still don't 
see how it's better than a  full set of columns for start and end 
times.  Can you tell me why you are trying to avoid that design? 

Hmm... this seems like queue is nearly a synonym for group.  Can't we 
just add num_workers property to table groups?  That seems to accomplish 
the same thing.  And yes, a GUC variable to limits the total number of 
concurrent autovacuums is probably a good idea.



queue = group of groups.  But I'm not sure about this at all, which is
why I took it away from the proposal.


I think we can live without the groups of groups, at least for now. 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Autovacuum improvements

2007-01-14 Thread Matthew T. O'Connor

First, thanks for working on this.  I hope to be helpful with the design 
discussion and possibly some coding if I can find the time.


My initial reaction to this proposal is that it seems overly complex, 
however I don't see a more elegant solution.  I'm a bit concerned that 
most users won't figure out all the knobs.


Alvaro Herrera wrote:

I've been thinking how to improve autovacuum so that we can convince
more people that it can be enabled by default.  


I would like to see it enabled by default too, however the reason it 
isn't already enabled by default is that it caused failures in the 
regression test when we tried to turn it on during the 8.2 dev cycle and 
it was too close to beta to fix everything.  All this new machinery is 
great, but it doesn't address that problem.



Here are my thoughts.
There are two areas of improvements:

1. scheduling, and
2. process handling, i.e., how to have multiple vacuum processes running
   at any time.


Fail enough, but I would say the two biggest area for improvement are 
scheduling and preventing HOT tables from becoming vacuum starved 
(essentially what you said, but with a different emphasis).


[snip]


Process Handling


My idea here is to morph the current autovacuum daemon from an agent
that itself runs a vacuum command, into something that launches other
processes to run those vacuum commands.  I'll call this the autovacuum
launcher process, or the launcher for short.  The idea here is that the
launcher can take care of the scheduling while the worker processes do
their work.  If the launcher then determines that a particular instant
there should be two vacuums running, then it simply starts two worker
processes.


How about calling it the autovacuum_master process?

[snip autovacuum launcher process description]

That all sounds reasonable to me.  I think the harder part is what you 
are getting at below (how to get the launcher to figure out what to 
vacuum when).



Scheduling
==
We introduce the following concepts:

1. table groups.  We'll have a system catalog for storing OID and group
name, and another catalog for membership, linking relid to group OID.

pg_av_tablegroup
  tgrname   name

pg_av_tgroupmembers
  groupid   oid
  relid oid



2. interval groups.  We'll have a catalog for storing igroup name and
OID, and another catalog for membership.  We identify an interval by:
   - month of year
   - day of month
   - day of week
   - start time of day
   - end time of day

This is modelled after crontabs.

pg_av_intervalgroup
 igrnamename

pg_av_igroupmembers
 groupidoid
 month  int
 domint
 dowint
 starttime  timetz
 endtimetimetz


This seems to assume that the start and end time for an interval will be 
on the same day, you probably need to specify a start month, dom, dow, 
time and an end month, dom, dow and time.


Since this is modeled after cron, do we allow wild-cards, or any of the 
other cron tricks like */20 or 1-3,5,7,9-11?


Also your notation above is ambiguous, it took me a while to realize 
that pg_av_igroupmembers.groupid wasn't referencing the id from 
pg_av_tablegroup.



Additionally, we'll have another catalog on which we'll store table
groups to interval groups relationships.  On that catalog we'll also
store those autovacuum settings that we want to be able to override:
whether to disable it for this interval group, or the values for the
vacuum/analyze equations.

pg_av_schedule
 tgroup oid
 igroup oid
 enabledbool
 queue  int
 vac_base_threshint
 vac_scale_factor   float
 anl_base_threshint
 anl_scal_factorfloat
 vac_cost_delay int
 vac_cost_limit int
 freeze_min_age int
 freeze_max_age int



What is queue for?


So the scheduler, at startup, loads the whole schedule in memory, and
then wakes up at reasonable intervals and checks whether these equations
hold for some of the tables it's monitoring.  If they do, then launch a
new worker process to do the job.

We need a mechanism for having the scheduler rescan the schedule when a
user modifies the catalog -- maybe having a trigger that sends a signal
to the process is good enough (implementation detail: the signal must be
routed via the postmaster, since the backend cannot hope to know the
scheduler's PID.  This is easy enough to do.)


This all looks reasonable if not a bit complex.  Question, what happens 
to the current pg_autovacuum relation?


Also what about system defaults, will we have a hard coded default 
interval of always on, and one default table group that contains all the 
tables with one default entry in pg_av_schedule?


I think we need more discussion on scheduling, we need to make sure this 
solves the vacuum starvation problem.  Does the launcher process 
consider each row in pg_av_schedule that

1 2 3 4 5 >

1 - 100 of 425 matches

Mail list logo