Re: [HACKERS] Getting the OID inside the executor

2011-09-01 Thread Fazool
Thanks for the reply Tom.

As far as I know, the TID of a row can change over time (e.g. when the
row is updated). Isn't it so? If so, it wont solve my problem.

It says here (http://www.postgresql.org/docs/9.0/static/ddl-system-columns.html)
that
"The OID, or even better a user-defined serial number, should be used
to identify logical rows.", so I thought I can use OID. You are right
as OIDs are 32-bit ints, so they will wrap-around at some point.
The reason why I don't want to use a SERIAL user-column is that I want
the process to be transparent to the user.

What I want accomplished is the following. I want to trace all
accesses (read/update/insert) to rows. If a row was accessed twice, it
should appear twice in the trace. Later (offline), I want to analyze
the trace to find some patterns, and after deducing some results from
the pattern, I should be able to access the original rows in the
database. For example, if my pattern-analysis says that row x is
"best", I should be able to read x from the DB. Hence, a tuple should
uniquely be identifiable (have same ID) in the DB and in the trace.

What would you suggest for such a scenario?

Thanks for the help.


On Thu, Sep 1, 2011 at 10:03 PM, Tom Lane  wrote:
> Fazool  writes:
>> I am implementing some functionality into Postgresql, where I want to
>> track which row was accessed by a user query. I am implementing the
>> functionality inside Postgres, so that there are no changes required
>> on client side (e.g. re-writing queries).
>> Rows are identified by OIDs, and I have set default_with_oids=true, so
>> that all tables are created with OIDs.
>
> It's a serious, serious error to suppose that OIDs are adequate
> identifiers for rows in user tables.
>
> We use OIDs to identify rows in system catalogs.  We can get away with
> that, more or less, because (a) system catalogs are not likely to
> contain billions of rows, and (b) we place a unique index on OID on
> every system catalog that has OIDs.  Neither of these statements is
> very tenable for user tables.
>
> You haven't said anything about what it is you actually need to
> accomplish here, but can you use TIDs as row identifiers?  What's
> the required lifetime of the identifiers?
>
>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting the OID inside the executor

2011-09-01 Thread Tom Lane
Fazool  writes:
> I am implementing some functionality into Postgresql, where I want to
> track which row was accessed by a user query. I am implementing the
> functionality inside Postgres, so that there are no changes required
> on client side (e.g. re-writing queries).
> Rows are identified by OIDs, and I have set default_with_oids=true, so
> that all tables are created with OIDs.

It's a serious, serious error to suppose that OIDs are adequate
identifiers for rows in user tables.

We use OIDs to identify rows in system catalogs.  We can get away with
that, more or less, because (a) system catalogs are not likely to
contain billions of rows, and (b) we place a unique index on OID on
every system catalog that has OIDs.  Neither of these statements is
very tenable for user tables.

You haven't said anything about what it is you actually need to
accomplish here, but can you use TIDs as row identifiers?  What's
the required lifetime of the identifiers?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Getting the OID inside the executor

2011-09-01 Thread Fazool
Hello everyone,

I am implementing some functionality into Postgresql, where I want to
track which row was accessed by a user query. I am implementing the
functionality inside Postgres, so that there are no changes required
on client side (e.g. re-writing queries).
Rows are identified by OIDs, and I have set default_with_oids=true, so
that all tables are created with OIDs.

Now, when I run a 'select * from my_table', I can easily get the OID
inside the executor because I get a 'HeapTuple' from ExecSan(). I call
HeapTupleGetOid() on the heap tuple and I have the oid.

The problem is that when I run a query with a projection, e.g.,
'select name from my_table', then the scanner returns a null
HeapTuple. How can I get the OID in such a case?

What would be the best way of doing this? Should I modify the planner
to add 'get oid' as a junk filter? Would it affect something else?

Any comments will be highly appreciated.

Thanks!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-09-01 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


> It's off topic. But I think custom format would require a major mangling 
> to be able to handle a complete cluster. This isn't just a simple matter 
> of programming, IMNSHO.

Oh, I meant just having it create separate custom format files for each 
database. As shell scripts all over the world have been doing for years, 
but it would be nice if it was simply built in.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201109012139
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk5gM+oACgkQvJuQZxSWSsi+xgCfbr0q+Ilbw0JRsORLZN2pSz1r
JtcAoJaleZvW/wWtU83d9MVeOes4I6+0
=VqFQ
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange row number estimates in pg9.1rc1

2011-09-01 Thread Bruce Momjian
Tom Lane wrote:
> I wrote:
> > "Sergey E. Koposov"  writes:
> >> I'm seeing something weird which looks like a bug in 9.1rc1 after the 
> >> upgrade 8.4->9.0->9.1 done using pg_upgrade.
> 
> > Hm, I wonder what pg_upgrade left relpages/reltuples set to ...
> 
> Sure enough, that's the problem.  pg_upgrade leaves relpages/reltuples
> set to zero, but it also imports the visibility map pages from the old
> cluster.  If the old visibility map shows the table as all-visible,
> then this happens when you try to VACUUM ANALYZE the table:
> 
> 1. VACUUM doesn't process any pages, so it has no tuple density
> estimate.  It leaves reltuples set to zero, but it does set relpages.
> 
> 2. ANALYZE scans some part of the table.  It gets a tuple density
> estimate for those pages ... but if that's only a small fraction of
> the table, it believes the zero estimate of tuple density elsewhere.
> So you get only a small update of reltuples.
> 
> (The above behavior is new as of commit
> b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8, BTW.)
> 
> Basically, step 1 is buggy here: if we aren't making an update to
> reltuples, we shouldn't set relpages either.  Setting it nonzero
> changes the implied tuple density from "unknown" to "known zero",
> which is wrong.
> 
> I'll go fix that, but I think it might be a good idea for pg_upgrade
> to think about preserving the relpages/reltuples columns ...

pg_upgrade currently only restores some oids and frozenxids.  We would
need to modify pg_dump --binary-upgrade mode to restore those values ---
it isn't hard to do.

> PS: right now, you cannot reproduce this in a 9.0 -> HEAD upgrade,
> because of this patch:
> 
> commit 00a7c9014a8fbb7388a807daeba3e0a85b49a747
> Author: Bruce Momjian 
> Date:   Fri Aug 19 11:20:30 2011 -0400
> 
> In pg_upgrade, don't copy visibility map files from clusters that did not
> have crash-safe visibility maps to clusters that expect crash-safety.
> 
> Request from Robert Haas.
> 
> I did reproduce it in a 9.0->9.1 test.

Right, that is expected.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-01 Thread Tomas Vondra
I've prepared a significantly simplified version of the patch. The two
main changes are

(a) it does not update the pg_stat_bgwriter anymore, it just prints an
info to the server log

(b) a new GUC is not required, it's driven by the log_checkpoints

This version will log at least 10 'checkpoint status' lines (at 10%, 20%,
30%, ...) and whenever 5 seconds since the last log elapses. The time is
not checked for each buffer but for 128 buffers.

So if the checkpoint is very slow, you'll get a message every 5 seconds,
if it's fast you'll get 10 messages.

Tomasdiff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
new file mode 100644
index 4c7cfb0..e60d304
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*** BufferSync(int flags)
*** 1175,1180 
--- 1175,1193 
int num_to_write;
int num_written;
int mask = BM_DIRTY;
+   
+   int num_since_update;
+   
+   longcurr_secs,
+   total_secs;
+   int curr_usecs,
+   total_usecs;
+   float   curr_time,
+   total_time;
+   
+   TimestampTz startTimestamp, lastTimestamp;
+   
+   int log_interval, check_interval;
  
/* Make sure we can handle the pin inside SyncOneBuffer */
ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
*** BufferSync(int flags)
*** 1238,1243 
--- 1251,1265 
buf_id = StrategySyncStart(NULL, NULL);
num_to_scan = NBuffers;
num_written = 0;
+   num_since_update = 0;
+   
+   startTimestamp = GetCurrentTimestamp();
+   lastTimestamp = startTimestamp;
+   
+   /* check the progress each  128 buffers or 10% */
+   log_interval = (int)round(num_to_write/10);
+   check_interval = ((log_interval < 128) && (log_interval > 0)) ? 
log_interval : 128;
+   
while (num_to_scan-- > 0)
{
volatile BufferDesc *bufHdr = &BufferDescriptors[buf_id];
*** BufferSync(int flags)
*** 1261,1266 
--- 1283,1334 
TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
BgWriterStats.m_buf_written_checkpoints++;
num_written++;
+   num_since_update++;
+   
+   /*
+* Every time we write enough buffers 
(checkpoint_update_limit),
+* we log a checkpoint status message and 
update the bgwriter
+* stats (so that the pg_stat_bgwriter table 
may be updated).
+* 
+* The log message contains info about total 
number of buffers to
+* write, how many buffers are already written, 
average and current
+* write speed and an estimate remaining time.
+*/
+   if ((log_checkpoints) && (num_written % 
check_interval == 0)) {
+ 
+   TimestampDifference(lastTimestamp,
+   GetCurrentTimestamp(),
+   &curr_secs, &curr_usecs);
+   
+   /* if at least 5 seconds elapsed since 
the last log, log */
+   if ((curr_secs >= 5) || 
(num_since_update >= log_interval)) {
+   
+   
TimestampDifference(startTimestamp,
+   GetCurrentTimestamp(),
+   &total_secs, 
&total_usecs);
+   
+   curr_time = curr_secs + 
(float)curr_usecs / 100;
+   total_time = total_secs + 
(float)total_usecs / 100;
+   
+   elog(LOG, "checkpoint status: 
wrote %d buffers of %d (%.1f%%) in %.1f s; "
+   "average %.1f MB/s (%d 
buffers, %ld.%03d s), "
+   "current %.1f MB/s (%d 
buffers, %ld.%03d s), "
+   "remaining %.1f s",
+   num_written, 
num_to_write, ((float) num_written * 100 / num_to_write)

Re: [HACKERS] sha1, sha2 functions into core?

2011-09-01 Thread Daniel Farina
On Wed, Aug 31, 2011 at 11:12 AM, Ross J. Reedstrom  wrote:
> Hmm, this thread seems to have petered out without a conclusion. Just
> wanted to comment that there _are_ non-password storage uses for these
> digests: I use them in a context of storing large files in a bytea
> column, as a means to doing data deduplication, and avoiding pushing
> files from clients to server and back.

Yes, agreed: there is no decent content-addressing type in PostgreSQL,
so one rolls their own using shas and joins; I've seen this more than
once. It's a useful way to get non-bloated index on a series of
(larger than sha1) values where one only cares about the equality
operator (hash indexes, as unattractive as they were before in
PostgreSQL's implementation are even less so now with streaming
replication).

When that content to be addressed can be submitted from another
source, anything with md5 is correctly met with suspicion. We have
gone to the trouble of using pgcrypto to get sha1 access, but I know
of other applications that would have preferred to use sha but settle
for md5 simply because it's known to be bundled in core everywhere.

CREATE EXTENSION -- particularly if there is *any* way (is there? even
with ugliness like utility statement hooks) to configure it on the
provider end to not require superuser for common extensions like
'pgcrypto' -- could ablate this issue and one could get off the hash
"treadmill", including md5 -- but I think that would be a mistake.
Applications need a high quality digest to enable any kind of
principled content addressing use case, and I think making that any
harder than a builtin is going to negatively impact the state of
things at large.  As a compromise, I'd also be happy with making
CREATE EXTENSION so trivial that everyone who has that use case can
get pgcrypto on any hosting provider.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-01 Thread Bruce Momjian
Peter Eisentraut wrote:
> On tis, 2011-08-30 at 16:25 -0400, Tom Lane wrote:
> > So I think that as given, this script is only useful for testing
> > pg_upgrade of $currentversion to $currentversion.  Which is surely
> > better than no test at all, but it would not for example have caught
> > the 8.3 incompatibility that was just reported.
> 
> Well, the goal was always current to current version.  Cross-version
> testing is obviously important, but will be quite a bit harder.
> 
> > How can we improve things here?  I've toyed with the idea of
> > installing pg_regress.so so that we can refer to it relative to
> > $libdir, but that might be a bit invasive, especially if we were to
> > try to back-patch it as far as 8.3. 
> 
> Aside from hesitations to backpatch those sorts of changes, it would
> effectively prevent us from ever removing anything from the C libraries
> used in the regression tests, because we need to keep the symbols around
> so that the schema dump can load successfully into the new instance.
> 
> I think a solution would have to be one of:
> 
> 1) pg_upgrade needs a mode to cope with these situations.  It can tell
> the user, I upgraded your installation, but some dynamic modules appear
> to be missing, you need to sort that out before you can put this back
> into use.
> 
> 2) Design a different test schema to load into the database before
> running pg_upgrade.  This would then be a one-line change in the script.

Here are the scripts I use for testing:

http://momjian.us/expire/pg_upgrade_test.tgz

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-01 Thread Bruce Momjian
Tom Lane wrote:
> Peter Eisentraut  writes:
> > +# contrib/pg_upgrade/test.sh
> > +#
> > +# Test driver for pg_upgrade.  Initializes a new database cluster,
> > +# runs the regression tests (to put in some data), runs pg_dumpall,
> > +# runs pg_upgrade, runs pg_dumpall again, compares the dumps.
> 
> Hm .. my experience is that that doesn't work at all, because the
> regression tests set up assorted C functions whose implementations are
> in pg_regress.so, and it creates them with absolute path references
> to pg_regress.so.  When you try to load that into another installation
> that's a different version of PG, it quite properly fails.  So I think
> that as given, this script is only useful for testing pg_upgrade of
> $currentversion to $currentversion.  Which is surely better than no test

Reminder --- you can't use pg_upgrade to go from the same catalog
version to the same catalog version because the catalog version is
embedded in the tablespace directory name.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-01 Thread Robert Haas
On Thu, Sep 1, 2011 at 3:05 PM, Josh Berkus  wrote:
> +1
>
> I'm also wondering if providing some shell script examples of a
> fault-tolerant script to handle archiving would be useful.

I think it would.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-01 Thread Josh Berkus

> Seems like it would be better to fix archive_mode so that it can be
> changed without a restart.

+1

I'm also wondering if providing some shell script examples of a
fault-tolerant script to handle archiving would be useful.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "stored procedures"

2011-09-01 Thread Christopher Browne
On Thu, Sep 1, 2011 at 1:18 PM, Josh Berkus  wrote:
> On 8/31/11 12:15 PM, Merlin Moncure wrote:
>> An out of process, autonomous transaction type implementation should
>> probably not sit under stored procedures for a number of reasons --
>> mainly that it's going to expose too many implementation details to
>> the user.  For example, does a SP heavy app have 2*N running
>> processes?  Or do we slot them into a defined number of backends for
>> that purpose? Yuck & yuck.  I like the AT feature, and kludge it
>> frequently via dblink, but it's a solution for a different set of
>> problems.
>
> I think that transaction control without parallelism would be the 80%
> solution.  That is, an SP has transaction control, but those
> transactions are strictly serial, and cannot be run in parallel.  For
> example, if you were writing an SP in PL/pgSQL, each "BEGIN ... END"
> block would be an explicit transaction, and standalone-only statements
> be allowed between BEGIN ... END blocks, or possibly in their own
> special block type (I prefer the latter).
>
> One issue we'd need to deal with is exception control around
> single-statement transactions and non-transactional statements (VACUUM,
> CREATE INDEX CONCURRENTLY, CHECKPOINT, etc.).  In some cases, the user
> is going to want to catch exceptions and abort the SP, and in other
> cases ignore them, so both need to be possible.

Yep, "+1" on that.

Leaving out parallelism, and having the mechanism operate under the
auspices of a single connection, makes a fine start, and perhaps is
enough even in the longer run.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why buildfarm member anchovy is failing on 8.2 and 8.3 branches

2011-09-01 Thread Tom Lane
I wrote:
> What *does* seem feasible is to back-port just the single change we
> actually need, by copying the two relevant macros into one of our
> config/ source files for the configure script.  I've tested that in
> 8.3 and it seems to work --- at least, the generated configure script
> changes in the expected way.  This also seems like a reasonably sane
> thing to back-port to 8.2.  So I'll go ahead and commit those things
> and see if anchovy likes it.

So the upshot is that that fixed the 8.3 build, but anchovy is still
failing on 8.2, with some different errors:

/usr/bin/ld.gold: /tmp/ccn7RPJJ.ltrans0.ltrans.o: in function 
base_yyparse:y.tab.c:12777: error: undefined reference to 'filtered_base_yylex'
/usr/bin/ld.gold: /tmp/ccn7RPJJ.ltrans0.ltrans.o: in function 
base_yyparse:gram.y:494: error: undefined reference to 'parsetree'
/usr/bin/ld.gold: /tmp/ccn7RPJJ.ltrans7.ltrans.o: in function 
parseTypeString:parse_type.c:445: error: undefined reference to 'raw_parser'
/usr/bin/ld.gold: /tmp/ccn7RPJJ.ltrans19.ltrans.o: in function 
simplify_function.128434.2836:postgres.c:544: error: undefined reference to 
'raw_parser'
/usr/bin/ld.gold: /tmp/ccn7RPJJ.ltrans19.ltrans.o: in function 
pg_parse_and_rewrite:postgres.c:544: error: undefined reference to 'raw_parser'
/usr/bin/ld.gold: /tmp/ccn7RPJJ.ltrans19.ltrans.o: in function 
fmgr_sql_validator:postgres.c:544: error: undefined reference to 'raw_parser'
collect2: ld returned 1 exit status

I went so far as to install Arch Linux here, but I cannot duplicate
the above.  (Although I wonder whether my machine is really doing
link-time optimization, since it doesn't generate any compiler warning
messages during the link step, as anchovy is doing.)

But these errors seem like they should be impossible anyway,
since there is nothing platform-specific about our uses of any of the
mentioned functions.  I wonder if there is something messed up with
anchovy's copy of REL8_2_STABLE.  Marti, could I trouble you to
blow away and recreate that machine's 8.2 checkout, as well as any
compiler cache directories?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "stored procedures"

2011-09-01 Thread Josh Berkus
On 8/31/11 12:15 PM, Merlin Moncure wrote:
> An out of process, autonomous transaction type implementation should
> probably not sit under stored procedures for a number of reasons --
> mainly that it's going to expose too many implementation details to
> the user.  For example, does a SP heavy app have 2*N running
> processes?  Or do we slot them into a defined number of backends for
> that purpose? Yuck & yuck.  I like the AT feature, and kludge it
> frequently via dblink, but it's a solution for a different set of
> problems.

I think that transaction control without parallelism would be the 80%
solution.  That is, an SP has transaction control, but those
transactions are strictly serial, and cannot be run in parallel.  For
example, if you were writing an SP in PL/pgSQL, each "BEGIN ... END"
block would be an explicit transaction, and standalone-only statements
be allowed between BEGIN ... END blocks, or possibly in their own
special block type (I prefer the latter).

One issue we'd need to deal with is exception control around
single-statement transactions and non-transactional statements (VACUUM,
CREATE INDEX CONCURRENTLY, CHECKPOINT, etc.).  In some cases, the user
is going to want to catch exceptions and abort the SP, and in other
cases ignore them, so both need to be possible.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Informix FDW - anybody working on this?

2011-09-01 Thread Robert Treat
On Wed, Aug 31, 2011 at 8:13 AM, Bernd Helmle  wrote:
> Out of curiosity,
>
> is anybody working on $subject? I'm currently planning to work on such a
> driver,
> but given the current stream of new drivers i want to make sure to not
> duplicate any efforts...
>

The most complete list I've seen of FDW's is on the wiki:
http://wiki.postgresql.org/wiki/Foreign_data_wrappers

Note there is an ODBC FDW, which might work, but if you start working
on an Informix specific one, please add it there. (I say this, knowing
that one of my co-workers has the outlines of a riak fdw he hasn't
listed yet... guess I should go pester him).


Robert Treat
conjecture: xzilla.net
consulting: omniti.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-01 Thread Hannu Krosing
Hallow hackers

I have the following problem to solve and would like to get advice on
the best way to do it.

The problem:

When growing a pl/proxy based database cluster, one of the main
operations is splitting a partition. The standard flow is as follows:

1) make a copy of the partitions table(s) to another database
2) reconfigure pl/proxy to use 2 partitions instead of one

The easy part is making a copy of all or half of the table to another
database. The hard part is fast deletion (i mean milliseconds,
comparable to TRUNCATE) the data that should not be in a partition (so
that RUN ON ALL functions will continue to return right results).

It would be relatively easy, if we still had the RULES for select
available for plain tables, but even then the eventual cleanup would
usually mean at least 3 passes of disk writes (set xmax, write deleted
flag, vacuum and remove)

What I would like to have is possibility for additional visibility
checks, which would run some simple C function over tuple data (usually
hash(fieldval) + and + or ) and return visibility (is in this partition)
as a result. It would be best if this is run at so low level that also
vacuum would use it and can clean up the foreign partition data in one
pass, without doing the delete dance first.

So finally the QUESTION :

where in code would be the best place to check for this so that

1) both regular queries and VACUUM see it
2) the tuple data (and not only system fields or just xmin/xmax) would
be available for the function to use


-- 
---
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of useless include references

2011-09-01 Thread Bruce Momjian
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Excerpts from Bruce Momjian's message of jue sep 01 11:04:33 -0300 2011:
> >> I have re-run the script and applied the result, again tested on BSD and
> >> Linux.   I will monitor the buildfarm for possible failures.
> 
> > I think anything of this sort should be tested on Windows too.
> 
> Well, if Windows is broken we'll find out soon enough from the
> buildfarm.  My recollection from the last go-round is that the pain
> points were in non-default #define options that Bruce hadn't tested
> and that no buildfarm critter exercised either, such as LOCK_DEBUG.

Ah, but this time I only removed includes for files I could compile with
all #if markers removed.  It only got 5.8k diff lines out a possible
6.8k lines, but this seems like an acceptable cost for greater
reliability.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of useless include references

2011-09-01 Thread Bruce Momjian
Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of jue sep 01 11:04:33 -0300 2011:
> > Bruce Momjian wrote:
> 
> > > OK, try #2.  I already had code that removed #if/#else/#endif code in
> > > *.h files for better testing, so I extended that to all *.c files.  This
> > > reduces the size of the diff from 6.6k lines to 4.7k lines but it makes
> > > it much less likely that there will be problems from running
> > > pgrminclude.
> > > 
> > > The current patch is here:
> > > 
> > > http://momjian.us/expire/pgrminclude.diff
> > > 
> > > I tested the patch on BSD and Linux.
> > 
> > I have re-run the script and applied the result, again tested on BSD and
> > Linux.   I will monitor the buildfarm for possible failures.
> 
> I think anything of this sort should be tested on Windows too.

Agreed, but we have so many Windows configurations I figured I would let
the buildfarm test them, no?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of useless include references

2011-09-01 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Bruce Momjian's message of jue sep 01 11:04:33 -0300 2011:
>> I have re-run the script and applied the result, again tested on BSD and
>> Linux.   I will monitor the buildfarm for possible failures.

> I think anything of this sort should be tested on Windows too.

Well, if Windows is broken we'll find out soon enough from the
buildfarm.  My recollection from the last go-round is that the pain
points were in non-default #define options that Bruce hadn't tested
and that no buildfarm critter exercised either, such as LOCK_DEBUG.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of useless include references

2011-09-01 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of jue sep 01 11:04:33 -0300 2011:
> Bruce Momjian wrote:

> > OK, try #2.  I already had code that removed #if/#else/#endif code in
> > *.h files for better testing, so I extended that to all *.c files.  This
> > reduces the size of the diff from 6.6k lines to 4.7k lines but it makes
> > it much less likely that there will be problems from running
> > pgrminclude.
> > 
> > The current patch is here:
> > 
> > http://momjian.us/expire/pgrminclude.diff
> > 
> > I tested the patch on BSD and Linux.
> 
> I have re-run the script and applied the result, again tested on BSD and
> Linux.   I will monitor the buildfarm for possible failures.

I think anything of this sort should be tested on Windows too.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of useless include references

2011-09-01 Thread Bruce Momjian
Bruce Momjian wrote:
> Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > It has been years since I ran src/tools/pginclude/pgrminclude to remove
> > > > unnecessary include files.  (I have already fixed things so include
> > > > files can be compiled on their own.)
> > > 
> > > > The attached patch removes unneeded include references, and marks some
> > > > includes as needing to be skipped by pgrminclude.
> > > 
> > > > I am sure applying this patch will break builds on some platforms and
> > > > some option combinations so I will monitor the buildfarm when I apply it
> > > > and make adjustments.
> > > 
> > > The last time you did this was in July 2006.  It took us two weeks to
> > > mostly recover, but we were still dealing with some fallout in December,
> > > cf
> > > http://archives.postgresql.org/pgsql-hackers/2006-12/msg00491.php
> > > 
> > > We had the buildfarm then, had had it for a couple years.  The notion
> > > that watching the buildfarm is enough is fully disproven by history.
> > > 
> > > Unless you have a better test plan than last time (which this isn't),
> > > I don't think this should be done at all.  The benefits are microscopic
> > > and the pain real.
> > 
> > I don't have a better plan.  There are #ifdef code blocks that often
> > don't get processed and therefore this can't be done better.  I will
> > abandon the idea.
> 
> OK, try #2.  I already had code that removed #if/#else/#endif code in
> *.h files for better testing, so I extended that to all *.c files.  This
> reduces the size of the diff from 6.6k lines to 4.7k lines but it makes
> it much less likely that there will be problems from running
> pgrminclude.
> 
> The current patch is here:
> 
>   http://momjian.us/expire/pgrminclude.diff
> 
> I tested the patch on BSD and Linux.

I have re-run the script and applied the result, again tested on BSD and
Linux.   I will monitor the buildfarm for possible failures.

This is not something we are going to do regularly, but probably every
five years like this time.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mb_regress.sh gripes

2011-09-01 Thread Robert Haas
On Thu, Aug 18, 2011 at 6:19 PM, Josh Kupershmidt  wrote:
> A few gripes about mb_regress.sh:
>  1. No exit code is specified, so even if there are differences
> between results/ and expected/ the script will still return 0.
>
>  2. The 'dropdb' command is used to wipe out the "utf8" database
> before the run. This generates an error message like:
>  dropdb: database removal failed: ERROR:  database "utf8" does not exist
>
> the first time you run the script. IMO it would be less startling to
> just print a NOTICE here.
>
>  3. No error checking for whether createdb succeeds.
>
> The attached patch fixes these problems.

Committed, with some changes.  I used the new --if-exists option for
dropdb rather than doing it as you had it here; I assume this may have
been the motivation for that patch.  I also just made the exit code 1
no matter how many failures there were.  That seems more normal, and I
wasn't altogether certain that $((expr)) is completely portable.  I
also set the execute bit on the script.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-09-01 Thread Kohei Kaigai
> On Fri, Aug 26, 2011 at 5:32 AM, Kohei KaiGai  wrote:
> > Yes. It also caches an expected security label when a client being
> > labeled as "scontext" tries to execute a procedure being labeled as
> > "tcontext", to reduce number of system call invocations on fmgr_hook
> > and needs_fmgr_hook.
> > If the expected security label is not same with "scontext", it means
> > the procedure performs as a trusted procedure that switches security
> > label of the client during its execution; like a security invoker
> > function.
> > A pair of security labels are the only factor to determine whether the
> > procedure is a trusted-procedure, or not. Thus, it is suitable to
> > cache in userspace avc.
> 
> I've committed this, but I still think it would be helpful to revise
> that comment.  The turn "boosted up" is not very precise or
> informative.  Could you submit a separate, comment-only patch to
> improve this?
> 
OK, Please wait for a few days.

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink make fails under postgresql 8.4.4 on mac osx 10.4.11

2011-09-01 Thread Robert Haas
On Wed, Aug 31, 2011 at 9:45 AM, Gary Merkel  wrote:
> Having trouble installing dblink under PostgreSQL 8.4.4 on MAC OS X 10.4.11
>
> Running make gives the following error:
>
> sed 's,MODULE_PATHNAME,$libdir/dblink,g' dblink.sql.in >dblink.sql
>
> gcc -no-cpp-precomp -O2 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv
> -I../../src/interfaces/libpq -I. -I../../src/include   -c -o dblink.o
> dblink.c
> dblink.c: In function 'get_pkey_attnames':
> dblink.c:1698: error: 'SnapshotNow' undeclared (first use in this function)
> dblink.c:1698: error: (Each undeclared identifier is reported only once
> dblink.c:1698: error: for each function it appears in.)
> make: *** [dblink.o] Error 1
>
> Does anyone know a fix to this?

This is hard to understand, because SnapshotNow is defined in
utils/tqual.h, and dblink.c includes that file.

Also, I checked out the official sources at tag REL8_4_4 and it looks
to me like there's no reference to SnapshotNow on line 1698.  That
line for me is blank, and the only reference I see to SnapshotNow is
on line 1715.

So I suspect that what you've got is a modified version of the sources
that is somehow messed up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-09-01 Thread Robert Haas
On Fri, Aug 26, 2011 at 5:32 AM, Kohei KaiGai  wrote:
> Yes. It also caches an expected security label when a client being
> labeled as "scontext" tries to execute a procedure being labeled as
> "tcontext", to reduce number of system call invocations on fmgr_hook
> and needs_fmgr_hook.
> If the expected security label is not same with "scontext", it means
> the procedure performs as a trusted procedure that switches security
> label of the client during its execution; like a security invoker
> function.
> A pair of security labels are the only factor to determine whether the
> procedure is a trusted-procedure, or not. Thus, it is suitable to
> cache in userspace avc.

I've committed this, but I still think it would be helpful to revise
that comment.  The turn "boosted up" is not very precise or
informative.  Could you submit a separate, comment-only patch to
improve this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-01 Thread Alexander Korotkov
Hi!

Ie expect some problems in support of comparison operators for text, because
locale string comparison can have unexpected behaviour.
Let's see the example. Create table with words and add extra leading space
to some of them.

test=# create table dict(id serial, word text);
NOTICE:  CREATE TABLE will create implicit sequence "dict_id_seq" for serial
column "dict.id"
CREATE TABLE
test=# \copy dict(word) from '/usr/share/dict/american-english';
test=# update dict set word = ' '||word where id%2=0;
UPDATE 49284

I use Ubuntu 11.04 with ru_RU.utf8 locale. So, comparison operators ignores
leading spaces.

test=# select * from dict where word between 'cart' and 'cary';
  id   |  word
---+
  3029 | Carter
  3031 | Cartesian
  3033 | Carthage's
  3035 | Cartier
  3037 | Cartwright
  3039 | Caruso
  3041 | Carver
 28419 | cart
 28421 | carted
 28423 | cartel's
 28425 | cartilage
 28427 | cartilages
 28429 | carting
 28431 | cartographer's
 28433 | cartography
 28435 | carton
 28437 | cartons
 28439 | cartoon's
 28441 | cartooning
 28443 | cartoonist's
 28445 | cartoons
 28447 | cartridge's
 28449 | carts
 28451 | cartwheel's
 28453 | cartwheeling
 28455 | carve
 28457 | carver
 28459 | carvers
 28461 | carving
 28463 | carvings
  3030 |  Carter's
  3032 |  Carthage
  3034 |  Carthaginian
  3036 |  Cartier's
  3038 |  Cartwright's
  3040 |  Caruso's
  3042 |  Carver's
 28420 |  cart's
 28422 |  cartel
 28424 |  cartels
 28426 |  cartilage's
 28428 |  cartilaginous
 28430 |  cartographer
 28432 |  cartographers
 28434 |  cartography's
 28436 |  carton's
 28438 |  cartoon
 28440 |  cartooned
 28442 |  cartoonist
 28444 |  cartoonists
 28446 |  cartridge
 28448 |  cartridges
 28450 |  cartwheel
 28452 |  cartwheeled
 28454 |  cartwheels
 28456 |  carved
 28458 |  carver's
 28460 |  carves
 28462 |  carving's
(59 rows)

But if I create spgist index query result differs.

test=# create index dict_idx on dict using spgist (word);
CREATE INDEX
test=# select * from dict where word between 'cart' and 'cary';
  id   |  word
---+
 28419 | cart
 28421 | carted
 28423 | cartel's
 28425 | cartilage
 28427 | cartilages
 28429 | carting
 28431 | cartographer's
 28433 | cartography
 28435 | carton
 28437 | cartons
 28439 | cartoon's
 28441 | cartooning
 28443 | cartoonist's
 28445 | cartoons
 28447 | cartridge's
 28449 | carts
 28451 | cartwheel's
 28453 | cartwheeling
 28455 | carve
 28457 | carver
 28459 | carvers
 28461 | carving
 28463 | carvings
(23 rows)

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-01 Thread Oleg Bartunov
This is updates SP-GiST patch, which fixed one bug and 
replaced test to the locale independent one.


On Wed, 31 Aug 2011, Oleg Bartunov wrote:


Hi there,

attached is our WIP-patch for 9.2 development source tree, which provides
implementation of SP-GiST (prototype was presented at PGCon-2011, see
http://www.pgcon.org/2011/schedule/events/309.en.html and presentation
for details) as a core feature.  Main differences from prototype version:

1. Now it's part of pg core, not contrib module
2. It provides more operations for quadtree and suffix tree
3. It uses clustering algorithm of nodes on disk and has much better
utilization of disk space. Fillfactor is supported
4. Some corner cases were eliminated
5. It provides support for concurency and recovery (inserts are
logged, supports for deletes, and log replay will be added really
soon)

So, now code contains almost all possible overhead of production code
and we ask hackers to test performance on real data sets. We expect
the same performance for random data (since almost no overlaps) and
much better performance on real-life data, plus much better index
creation time. Also, we appreciate your comments and suggestions about
API.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

spgist_patch-0.85.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-01 Thread Robert Haas
2011/9/1 Dimitri Fontaine :
> Peter Eisentraut  writes:
>> Well, we could make initdb patch it up, but that might seem excessive.
>
> I sometime wonder if archive_mode shouldn't default to "on" with the
> archive_command set to either '/bin/true' or 'rem' for windows.
>
> That allows to install proper archiving without restart, but the
> tradeoff is of course that you need to restart to enable some
> optimisation cases by turning archive_mode off.

Seems like it would be better to fix archive_mode so that it can be
changed without a restart.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-09-01 Thread Heikki Linnakangas

On 01.09.2011 12:23, Alexander Korotkov wrote:

On Thu, Sep 1, 2011 at 12:59 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:


So I changed the test script to generate the table as:

CREATE TABLE points AS SELECT random() as x, random() as y FROM
generate_series(1, $NROWS);

The unordered results are in:

  testname   |   nrows   |duration | accesses
-+**---+-+**--
  points unordered buffered   | 25000 | 05:56:58.575789 |  2241050
  points unordered auto   | 25000 | 05:34:12.187479 |  2246420
  points unordered unbuffered | 25000 | 04:38:48.663952 |  2244228

Although the buffered build doesn't lose as badly as it did with more
overlap, it still doesn't look good :-(. Any ideas?



But it's still a lot of overlap. It's about 220 accesses per small area
request. It's about 10 - 20 times greater than should be without overlaps.


Hmm, those "accesses" numbers are actually quite bogus for this test. I 
changed the creation of the table as you suggested, so that all x and y 
values are in the range 0.0 - 1.0, but I didn't change the loop to 
calculate those accesses, so it still queried for boxes in the range 0 - 
10. That makes me wonder, why does it need 220 accesses on average 
to satisfy queries most of which lie completely outside the range of 
actual values in the index? I would expect such queries to just look at 
the root node, conclude that there can't be any matching tuples, and 
return immediately.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-09-01 Thread Alexander Korotkov
On Thu, Sep 1, 2011 at 12:59 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> So I changed the test script to generate the table as:
>
> CREATE TABLE points AS SELECT random() as x, random() as y FROM
> generate_series(1, $NROWS);
>
> The unordered results are in:
>
>  testname   |   nrows   |duration | accesses
> -+**---+-+**--
>  points unordered buffered   | 25000 | 05:56:58.575789 |  2241050
>  points unordered auto   | 25000 | 05:34:12.187479 |  2246420
>  points unordered unbuffered | 25000 | 04:38:48.663952 |  2244228
>
> Although the buffered build doesn't lose as badly as it did with more
> overlap, it still doesn't look good :-(. Any ideas?


But it's still a lot of overlap. It's about 220 accesses per small area
request. It's about 10 - 20 times greater than should be without overlaps.
If we roughly assume that 10 times more overlap makes 1/10 of tree to be
used for actual inserts, then that part of tree can easily fit to the cache.
You can try my splitting algorithm on your test setup (it this case I advice
to start from smaller number of rows, 100 M for example).
I'm requesting real-life datasets which makes troubles in real life from
Oleg. Probably those datasets is even larger or new linear split produce
less overlaps on them.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: Fast GiST index build

2011-09-01 Thread Heikki Linnakangas

On 30.08.2011 13:38, Alexander Korotkov wrote:

On Tue, Aug 30, 2011 at 1:08 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:



Thanks. Meanwhile, I hacked together my own set of test scripts, and let
them run over the weekend. I'm still running tests with ordered data, but
here are some preliminary results:

   testname   |   nrows   |duration | accesses
-+**---+-+**--
  points unordered auto   | 25000 | 08:08:39.174956 |  3757848
  points unordered buffered   | 25000 | 09:29:16.47012  |  4049832
  points unordered unbuffered | 25000 | 03:48:10.999861 |  4564986

As you can see, the results are very disappointing :-(. The buffered builds
take a lot *longer* than unbuffered ones. I was expecting the buffering to
be very helpful at least in these unordered tests. On the positive side, the
buffering made index quality somewhat better (accesses column, smaller is
better), but that's not what we're aiming at.

What's going on here? This data set was large enough to not fit in RAM, the
table was about 8.5 GB in size (and I think the index is even larger than
that), and the box has 4GB of RAM. Does the buffering only help with even
larger indexes that exceed the cache size even more?


This seems pretty strange for me. Time of unbuffered index build shows that
there is not bottleneck at IO. That radically differs from my
experiments. I'm going to try your test script on my test setup.
While I have only express assumption that random function appears to be
somewhat bad. Thereby unordered dataset behave like the ordered one. Can you
rerun tests on your test setup with dataset generation on the backend like
this?
CREATE TABLE points AS (SELECT point(random(), random() FROM
generate_series(1,1000));


So I changed the test script to generate the table as:

CREATE TABLE points AS SELECT random() as x, random() as y FROM 
generate_series(1, $NROWS);


The unordered results are in:

  testname   |   nrows   |duration | accesses
-+---+-+--
 points unordered buffered   | 25000 | 05:56:58.575789 |  2241050
 points unordered auto   | 25000 | 05:34:12.187479 |  2246420
 points unordered unbuffered | 25000 | 04:38:48.663952 |  2244228

Although the buffered build doesn't lose as badly as it did with more 
overlap, it still doesn't look good :-(. Any ideas?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] setlocale() on Windows is broken

2011-09-01 Thread Heikki Linnakangas

On 31.08.2011 16:05, Heikki Linnakangas wrote:

While looking through old emails, I bumped into this:

http://archives.postgresql.org/message-id/25219.1303306...@sss.pgh.pa.us

To recap, setlocale() on Windows is broken for locale names that contain
dots or apostrophes in the country name. That includes "Hong Kong
S.A.R.", "Macau S.A.R.", and "U.A.E." and "People's Republic of China".

In April, I put in a hack to initdb to map those problematic names to
aliases that don't contain dots:

People's Republic of China -> China
Hong Kong S.A.R. -> HKG
U.A.E. -> ARE
Macau S.A.R. -> ZHM

However, Hiroshi pointed out in the thread linked above that that
doesn't completely solve the problem. If you set locale to "HKG", for
example, setlocale(LC_ALL, NULL) still returns the full name, "Hong Kong
S.A.R.", and if you feed that back to setlocale() it fails. In
particular, check_locale() uses "saved = setlocale(LC_XXX, NULL)" to get
the current value, and tries to restore it later with "setlocale(LC_XXX,
saved)".


At first, I thought I should revert my hack in initdb, since it's not
fully solving the problem anyway. But it doesn't really help - you run
into the same issue if you set locale to one of those aliases manually.
And that's exactly what users will have to do if we don't map those
locales automatically.

Microsoft should fix their bug. I don't have much faith in that
happening, however. So, I think we should move the mapping from initdb
to somewhere in src/port, so that the mapping is done every time
setlocale() is called. That would fix the problem with check_locale():
even though "setlocale(LC_XXX, NULL)" returns a value that won't work,
the setlocale() call to restore it would map it to an alias that does
work again.

In addition to that, I think we should check the return value of
setlocale() in check_locale(), and throw a warning if restoring the old
locale fails. The session's locale will still be screwed, but at least
you'll know if it happens.


I've committed a patch along those lines.

It turned out to be pretty difficult to reproduce user-visible buggy 
behavior caused by this bug, so for the sake of the archives, here's a 
recipe on that:


1. Set system locale to "Chinese_Hong Kong S.A.R..950"

2. initdb -D data --locale="Arabic_ARE"

3. Launch psql.

  CREATE TABLE foo (a text);
  INSERT INTO foo VALUES ('a'), ('A');

  -- Verify that the order is 'a', 'A'
  SELECT * FROM foo ORDER BY a;

  -- This fails, as it should
  CREATE DATABASE postgres WITH LC_COLLATE='C' TEMPLATE=template0;

  -- This also fails, as it should
  CREATE DATABASE postgres WITH LC_COLLATE='C' TEMPLATE=template0;

  -- The order returned by this is now wrong: 'A', 'a'
  SELECT * FROM foo ORDER BY a;

It's a bizarre looking sequence, but that does it.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-01 Thread Dimitri Fontaine
Peter Eisentraut  writes:
> Well, we could make initdb patch it up, but that might seem excessive.

I sometime wonder if archive_mode shouldn't default to "on" with the
archive_command set to either '/bin/true' or 'rem' for windows.

That allows to install proper archiving without restart, but the
tradeoff is of course that you need to restart to enable some
optimisation cases by turning archive_mode off.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers