[HACKERS] Reconstructing Insert queries with indirection

2012-03-21 Thread Ashutosh Bapat
Hi All,
Consider following sequence of commands

create type complex as (r float8, i float8);
create type quad as (c1 complex, c2 complex);
create temp table quadtable(f1 int, q quad);

insert into quadtable (f1, q.c1.r, q.c2.i) values(44,55,66);

While parsing the INSERT query, we parse the query with three columns and
three values in the target list, but during rewriting we combine q.c1.r and
q.c2.i into a single column in the form of FieldStore structure. In
Postgres-XC, we deparse these parse trees, to be sent to other PostgreSQL
servers. The function processIndirection(), which deparses the
indirections, can not handle more than one field in FieldStore node.

7344 /*
7345  * Print the field name.  There should only be one target
field in
7346  * stored rules.  There could be more than that in
executable
7347  * target lists, but this function cannot be used for that
case.
7348  */
7349 Assert(list_length(fstore-fieldnums) == 1);
7350 fieldname = get_relid_attribute_name(typrelid,
7351
linitial_int(fstore-fieldnums));
7352 if (printit)
7353 appendStringInfo(buf, .%s,
quote_identifier(fieldname));

Why is this restriction here?

The assertion is added by commit 858d1699. The notes for the commit have
following paragraph related to FieldStore deparsing.

I chose to represent an assignment ArrayRef as array[subscripts] :=
source,
which is fairly reasonable and doesn't omit any information.  However,
FieldStore is problematic because the planner will fold multiple
assignments
to fields of the same composite column into one FieldStore, resulting
in a
structure that is hard to understand at all, let alone display
comprehensibly.
So in that case I punted and just made it print the source
expression(s).

So, there doesn't seem to be any serious reason behind the restriction.

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company


[HACKERS] Re: [COMMITTERS] pgsql: Improve the -l (limit) option recently added to contrib/vacuumlo

2012-03-21 Thread Heikki Linnakangas

On 21.03.2012 01:05, Tom Lane wrote:

Improve the -l (limit) option recently added to contrib/vacuumlo.

Instead of just stopping after removing an arbitrary subset of orphaned
large objects, commit and start a new transaction after each -l objects.
This is just as effective as the original patch at limiting the number of
locks used, and it doesn't require doing the OID collection process
repeatedly to get everything.  Since the option no longer changes the
fundamental behavior of vacuumlo, and it avoids a known server-side
limitation, enable it by default (with a default limit of 1000 LOs per
transaction).

In passing, be more careful about properly quoting the names of tables
and fields, and do some other cosmetic cleanup.


Shouldn't this be backported? Without it, vacuumlo is effectively broken 
in 9.0 and 9.1.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Chronic performance issue with Replication Failover and FSM.

2012-03-21 Thread Heikki Linnakangas

On 20.03.2012 23:41, Josh Berkus wrote:

Heikki,


The FSM is included in the base backup, and it is updated when VACUUM
records are replayed.


Oh?  H.   In that case, the issue I'm seeing in production is
something else.  Unless that was a change for 9.1?


No, it's been like that since 8.4, when the FSM was rewritten.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Create index on foreign table

2012-03-21 Thread Etsuro Fujita

(2012/03/21 4:39), Robert Haas wrote:

On Fri, Mar 16, 2012 at 4:44 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp  wrote:



For a flat file, CREATE INDEX constructs an index in the same way as an
index for a regular table.


It seems really weird to have the data half inside the database and
half outside of it like this.  What is the use case for this feature?
I would have thought that something like file_fdw would be primarily
targeted toward use cases where you intend to read the data file only
a few times, or you always intend to sequential scan it.  If you need
to index it, why not store the data in a regular table?  That's what
they are for.  What you're proposing sounds to me like a lot of work
for no real gain.


This feature is planned to used to save time and space for loading file 
data into Postgres.  As you know, loading data is time-consuming.  In 
addition, it requires twice the disk space in the case where source 
files have to be stored against the time of need.  I think this feature 
is especially useful for data warehouse environments.



On the other hand, for a remote table,
CREATE INDEX collects information about the index on the specified
column(s) for the specified table that was created on the remote table.


I can possibly see the point of this.  Having local information about
which remote indexes are available seems like it could be useful.  As
Heikki says, you could cache it on a per-session basis, but that might
not be very efficient.

I also think that it would not be a very good idea to have CREATE
INDEX on a foreign table sometimes really create an index and other
times just define the properties of a remote index.  If we're going to
have both features at all, I think they should use different syntax.
I suggest that CREATE INDEX or CREATE FOREIGN INDEX is the right way
to provide the query planner with information about remote-side
indexes; and that if we even want to have indexes on flat files, the
interface to those should be exported via functions bundled in the
file_fdw extension rather than DDL.  There's no reason to suppose that
the indexes the FDW supports correspond to PostgreSQL's AMs, so tying
it into that framework doesn't seem wise.


I did an investigation on DB2 a little bit.  DB2 uses the CREATE INDEX 
SPECIFICATION ONLY statement to define the properties of a remote index.


CREATE INDEX index_name ON foreintable_name
(column_name) SPECIFICATION ONLY

How about introducing this kind of option?; Using the CREATE INDEX 
statement with the SPECIFICATION ONLY option, a user can just define the 
properties of a remote index.  On the other hand, using the statement 
without this option, he or she can specify more options like the USING 
option and really create an index, which requires that the FDW's AMs 
correspond to Postgres's AMs, as pointed out by you.  If the real index 
of an external data is considered as just a complementary data for 
efficient query processing like stats to be collected for the external 
data by the ANALYZE statement, it doen't seem so weird to use the DDL 
for the external data, create the real index for it, and store the index 
data inside Postgres.


Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-21 Thread Kohei KaiGai
2012/3/20 Robert Haas robertmh...@gmail.com:
 On Fri, Mar 16, 2012 at 3:44 AM, Yeb Havinga yebhavi...@gmail.com wrote:
 In the patch with copy-editing documentation following that commit, at in
 at their option, s/in// ?

 Oh, yeah.  Oops.  Thanks.

 Also 'rather than .. as mandated by the system':
 I'm having trouble parsing 'as'. It is also unclear to me what 'system'
 means: selinux or PostgreSQL, or both? I suspect it is PostgreSQL, since
 selinux is still enforcing / 'mandating' it's policy. What about rather
 than that the switch is controlled by the PostgreSQL server, as in the case
 of a trusted procedure.

 Well, I think it's both.  PostgreSQL is responsible for enforcing
 privileges on database objects, but it relies on SE-Linux to tell it
 whether a given access is allowable.  So, from PostgreSQL's point of
 view, it's delegating the decision to SE-Linux.  But SE-Linux views
 itself as a mechanism for enforcing a system-wide security policy, so
 views PostgreSQL as an instrument for carrying out its access control
 goals.  I don't know how to disentangle that.  I'm actually not
 entirely sure that I even believe the underlying sentiment that
 dynamic transitions are dangerous.  Maybe KaiGai could comment further
 on what we should be trying to convey here.

The reason why dynamic domain transition should be configured
carefully is that it partially allows users to switch their own privileges
in discretionary way, unlike trusted procedure.

The original model of selinux on operating system assumes all the
domain transition shall happen on execve(2) time, but it made clear
some sort of application is not happy with traditional fork - exec
lifecycle, such as web server, connection pooling software, or others.

Even as they perform according to the operations from users,
it does not fork - exec itself because of some reason, typically
performance. One point we should focus on is these applications
have relatively trustable portion and untrustable one.

The dynamic domain transition was designed to restrict privileges
more than the current one on the trustable portion, prior to launch
untrustable one. So, it never intend to switch client domain with
100% arbitrary. Its bottom line is restricted with the security policy;
that explicitly describes the range of domains being allowed to
translate.

So, we will be able to conclude dynamic domain transition is
harmless as long as it works to reduce privileges; that should
be guaranteed with the security policy.
It also means sepgsql_setcon() is harmless as long as it works
according to the decision of SELinux.

The connection pooling software scenario using trusted procedure
might be a bit confusing. In this case, the client domain is once
switched to the trusted one with mandatory way, then it switches
to more restricted domain in arbitrary way; thus, it is not allowed
to promote its privileges in arbitrary way.
We assume the trusted procedure is a enough small portion to
ensure bug or vulnerability free.

Joshua, please add some comments, if you have.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 8.4 planner question - bad plan, good plan for almost same queries.

2012-03-21 Thread Дмитрий

I think the reason the planner isn't too bright about this case is
http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=cd1f0d04bf06938c0ee5728fc8424d62bcf2eef3
ie, it won't do IN/EXISTS pullup below a NOT EXISTS.

HEAD is better, thanks to commit
0816fad6eebddb8f1f0e21635e46625815d690b9, but of course there is no
chance at all of back-patching the planner changes that depends on.


I found that it works fine for [NOT] EXISTS if I just fold query inside 
into select 1 from (...) vv. With my provided query, it uses seqscan 
for both EXISTS/NOT EXISTS without folding, and index scan with it.


Okay, it's easier for me to make automatic subquery folding.

Regards,
Dmitry

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-21 Thread Fujii Masao
On Thu, Mar 15, 2012 at 5:52 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 When all those changes are put together, the patched version now beats or
 matches the current code in the RAM drive tests, except that the
 single-client case is still about 10% slower. I added the new test results
 at http://community.enterprisedb.com/xloginsert-scale-tests/, and a new
 version of the patch is attached.

When I ran pgbench with v18 patch, I encountered the following PANIC error:

PANIC:  space reserved for WAL record does not match what was written

To investigate the cause, I applied the following changes and ran pgbench again,


diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index bfc7421..2cef0bd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1294,7 +1294,7 @@ CopyXLogRecordToWAL(int write_len, bool
isLogSwitch, XLogRecord *rechdr,
}

if (!XLByteEQ(CurrPos, EndPos))
-   elog(PANIC, space reserved for WAL record
does not match what was written);
+ elog(PANIC, space reserved for WAL record does not
match what was written, CurrPos: %X/%X, EndPos: %X/%X,
CurrPos.xlogid, CurrPos.xrecoff, EndPos.xlogid, EndPos.xrecoff);
}
else
{


then I got the following:

PANIC:  space reserved for WAL record does not match what was
written, CurrPos: C/0, EndPos: B/FF00

So I think that the patch would have a bug which handles WAL boundary wrongly.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-21 Thread Heikki Linnakangas

On 21.03.2012 13:14, Fujii Masao wrote:

 PANIC:  space reserved for WAL record does not match what was
written, CurrPos: C/0, EndPos: B/FF00

So I think that the patch would have a bug which handles WAL boundary wrongly.


Thanks for the testing! These WAL boundary issues are really tricky, you 
found bugs in that area before, and I found and fixed one before posting 
the last version, and apparently there's still at least one left.


Overall, what do you (and others) think about the state of this patch? 
I'm starting to feel that this needs to be pushed to 9.3. That bug might 
not be very hard to fix, but the fact that these bugs are still cropping 
up at this late stage makes me uneasy. That boundary calculation in 
particular is surprisingly tricky, and I think it could be made less 
tricky with some refactoring of the WAL-logging code, replacing 
XLogRecPtr with uint64s, like Peter (IIRC) suggested a while ago. And 
that seems like 9.3 material. Also, there's still these two known issues:


1. It slows down the WAL insertion in a single backend by about 10%
2. With lots of backends inserting tiny records concurrently, you get 
spinlock contention, which consumes a lot of CPU. Without the patch, you 
get lwlock contention and bandwidth isn't any better, but you sleep 
rather than spin.


I'm afraid those issues aren't easily fixable. I haven't been able to 
identify the source of slowdown in the single-backend case, it seems to 
be simply the distributed cost of the extra bookkeeping. That might be 
acceptable, 10% slowdown of raw WAL insertion speed is not good, but WAL 
insertion only accounts for a fraction of the total CPU usage for any 
real workload, so I believe the slowdown of a real application would be 
more like 1-3%, at worst. But I would feel more comfortable if we had 
more time to test that.


The spinlock contention issue might be acceptable too. I think it would 
be hard to run into it in a real application, and even then, the 
benchmarks show that although you spend a lot of CPU time spinning, you 
get at least the same overall bandwidth with the patch, which is what 
really matters. And I think it could be alleviated by reducing the time 
the spinlock is held, and I think that could be done by making the space 
reservation calculations simpler. If we got rid of the limitation that 
the WAL record header is never split across WAL pages, and always stored 
the continuation record header on all WAL pages, the space reservation 
calculation could be reduced to essentially currentpos += size. But 
that again seems 9.3 material.


So, although none of the issues alone is a show-stopper, but considering 
all these things together, I'm starting to feel that this needs to be 
pushed to 9.3. Thoughts?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Create index on foreign table

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 4:47 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 I did an investigation on DB2 a little bit.  DB2 uses the CREATE INDEX
 SPECIFICATION ONLY statement to define the properties of a remote index.

    CREATE INDEX index_name ON foreintable_name
    (column_name) SPECIFICATION ONLY

 How about introducing this kind of option?; Using the CREATE INDEX statement
 with the SPECIFICATION ONLY option, a user can just define the properties of
 a remote index.  On the other hand, using the statement without this option,
 he or she can specify more options like the USING option and really create
 an index, which requires that the FDW's AMs correspond to Postgres's AMs, as
 pointed out by you.  If the real index of an external data is considered as
 just a complementary data for efficient query processing like stats to be
 collected for the external data by the ANALYZE statement, it doen't seem so
 weird to use the DDL for the external data, create the real index for it,
 and store the index data inside Postgres.

I still don't think it's a good idea to introduce the concept of a
PostgreSQL index that indexes data not stored in the database.  There
is some pretty serious impedance mismatch there.  PostgreSQL indexes
are intended to store CTIDs; you might be able to hack things for
file_fdw to make a byte offset look like a CTID, but in general I
don't think you can count on making that work.  There's no guarantee
that a particular FDW provides unique identifiers for every data
element that fit in six bytes and allow for fast retrieval.  In fact,
beyond flat files, I suspect that's more the exception than the norm.
I agree with you that our bulk loading isn't fast enough (or
space-efficient enough) but I don't think the right solution is to
contort our index code, which is not designed to do this and probably
won't handle it very gracefully.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 7:52 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 So, although none of the issues alone is a show-stopper, but considering all
 these things together, I'm starting to feel that this needs to be pushed to
 9.3. Thoughts?

I think I agree.  I like the refactoring ideas that you're proposing,
but I don't really think we should be starting on that in mid-March.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-21 Thread Robert Haas
On Tue, Mar 20, 2012 at 6:16 PM, Greg Stark st...@mit.edu wrote:
 On Tue, Mar 20, 2012 at 8:00 PM, Robert Haas robertmh...@gmail.com wrote:
 Frankly that analysis didn't make any sense to me at the time.
 Comparing integers is fast, sure, but it's still slower than not
 having to do any comparison at all.

 I think you're underestimating how much it costs to call the
 datatype-specific comparator.  My belief is that it's wicked
 expensive.

 I'm totally with you on the datatype-specific comparator being expensive.

 But we must be talking about two different scenarios. I don't see why
 Tom's algorithm was slower than Knuth's unless there was a bug. It
 seems to me it should perform exactly as many comparator calls but
 save the integer comparisons and the extra space for them.

It seemed that way to me, too, but it wasn't.

 In the current algorithm, Knuth's, it compares the new tuple against
 the most recently emitted tuple to set the run number then adds it to
 the bottom of the heap and sifts up. If it's from the current run it
 does a bunch of integer comparisons and skips past the next run
 quickly. If it's from the next run it sifts up to the right spot in
 the next run and if it hits the top of the next run it does a quick
 integer comparison and stops.

Check.

 In Tom's algorithm it would perform the same comparison against the
 recently emitted tuple to set the run number and either add it to the
 unsorted list or the bottom of the heap. If it's added to the unsorted
 list we're done,

Check.

 if it's added to the bottom of the heap it performs
 the same siftup it would have done above except that it skips the
 bottom few levels of the heap -- all of which were fast integer
 comparisons.

I think this is where the wheels come off.  It's excessively
optimistic to suppose that we're going to skip the bottom few levels
of the heap.  Half of the elements in the heap have no children at
all; and half of the remainder are parents but not grand-parents.  If
you assume, at a given point in time, that half of the tuples we're
holding onto are for the next run, then the heap is ONE level
shallower than it would otherwise have been, and you figure to save
ONE integer comparison.  If we're relatively early in the run and only
one-tenth of the tuples we're holding onto are for the next run, then
heap is barely shallower at all and we're scarcely avoiding anything.

But having even a small number of next-run tuples scattered through
the heap gives us the opportunity to get lucky.  If the heap size is
N, even lg N next-run tuples in the heap is potentially enough for us
to avoid most of the calls to a real comparator, if it so happens that
all of those tuples are our ancestors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improve the -l (limit) option recently added to contrib/vacuumlo

2012-03-21 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 21.03.2012 01:05, Tom Lane wrote:
 Improve the -l (limit) option recently added to contrib/vacuumlo.

 Shouldn't this be backported? Without it, vacuumlo is effectively broken 
 in 9.0 and 9.1.

I was wondering about that.  You could argue for either it's a new
feature or it's a bug fix.  It's a simple enough change that I see
little risk in it, so the argument against backporting seems weak.

If there are not objections, I'll just copy the current vacuumlo code
in toto into 9.0 and 9.1.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-21 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 ... although none of the issues alone is a show-stopper, but considering 
 all these things together, I'm starting to feel that this needs to be 
 pushed to 9.3. Thoughts?

Agreed.  In particular, I think you are right that it'd be prudent to
simplify the WAL-location arithmetic and then rebase this code onto
that.  And since no code at all has been written for the arithmetic
change, I think we have to consider that it's not 9.2 material.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Robert Haas
On Tue, Mar 20, 2012 at 10:22 PM, Qi Huang huangq...@hotmail.com wrote:
 Thanks so much, Neil.
 I think I kind of understand the situation for now. The implementation
 posted by Neil was for the purpose of the talk, thus rushed and may not be
 up to st andard of Postgres Community. Also Neil mentioned the PRNG state in
 the patch is buggy, and maybe also some others. Thus, in the Gsoc project, I
 could understand the details of Neil's implementation, fix the bugs, make
 the code fit for the community standard, and test.
 Is there any comment on this?

In addition to that, you'll probably find that the old patch doesn't
apply any more, and you'll need to fix a lot of things to get it
working again.  The code has changed a lot in the meantime.

One thing we should probably try to establish before you get started
working on this is whether people want the feature, which is basically
the ability to write something like this in the FROM clause of a
query:

table_name TABLESAMPLE { BERNOULLI | SYSTEM } ( sample_percent ) [
REPEATABLE ( repeat_seed ) ] ]

I have at present no position on whether we want that or not, but
maybe someone else does.  The upside is that would be a more efficient
replacement for the ORDER BY random() trick that is often used today;
the downside is that it requires dedicated syntax and a whole new
executor node for something that, realistically, isn't going to come
up very often.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Create index on foreign table

2012-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Mar 21, 2012 at 4:47 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
 I did an investigation on DB2 a little bit.  DB2 uses the CREATE INDEX
 SPECIFICATION ONLY statement to define the properties of a remote index.
 
CREATE INDEX index_name ON foreintable_name
(column_name) SPECIFICATION ONLY
 
 How about introducing this kind of option?

 I still don't think it's a good idea to introduce the concept of a
 PostgreSQL index that indexes data not stored in the database.

Well, above Etsuro-san is proposing the other case, ie a Postgres index
definition for an index *not* stored in the database.  But frankly
I think both ideas are pretty bad.  There's no reason to expect that
Postgres' model of an index will accurately describe the facilities
available in a remote server; and if it's not accurate, is it really
of any use for planning?

There might be reason to store information about remote indexes locally
(although nobody has yet proven that session-local caching isn't
sufficient).  But there is very little reason to suppose that that
information will look the same for all different types of remote
servers, and I don't believe we know enough yet to design a common
representation.  I think it'd be wise to let the different FDWs solve
this problem on their own for a couple of years, and then we can look
at whether they have common requirements or not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 One thing we should probably try to establish before you get started
 working on this is whether people want the feature, which is basically
 the ability to write something like this in the FROM clause of a
 query:

 table_name TABLESAMPLE { BERNOULLI | SYSTEM } ( sample_percent ) [
 REPEATABLE ( repeat_seed ) ] ]

 I have at present no position on whether we want that or not, but
 maybe someone else does.  The upside is that would be a more efficient
 replacement for the ORDER BY random() trick that is often used today;
 the downside is that it requires dedicated syntax and a whole new
 executor node for something that, realistically, isn't going to come
 up very often.

Yeah --- you're talking about chunks of new code in both planner and
executor.  A very rough estimate is that this might be about as
complicated to do properly as MergeAppend was (and we're still shaking
out the bugs in that :-().

Now that would all be fine if this were a widely-desired feature, but
AFAIR the user demand for it has been about nil.  So I'm leaning to
the position that we don't want it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] renaming domain constraint

2012-03-21 Thread Robert Haas
On Fri, Mar 16, 2012 at 1:34 PM, Peter Eisentraut pete...@gmx.net wrote:
 Here is a patch for being able to rename constraints of domains.  It
 goes on top of the previously committed patch for renaming table
 constraints.

I don't like the way you've modified get_constraint_oid(), which is
currently parallel to many other get_whatever_oid() functions and with
this patch, would no longer be.  There seems to be little point in
shoehorning the new functionality into the existing function anyway,
considering that you've conditionalized basically every piece of logic
in the function.  I think you should just invent a completely separate
function and be done with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mié mar 21 11:35:54 -0300 2012:

 Now that would all be fine if this were a widely-desired feature, but
 AFAIR the user demand for it has been about nil.  So I'm leaning to
 the position that we don't want it.

I disagree with there being zero interest ... the order by random()
stuff does come up occasionally.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Finer Extension dependencies

2012-03-21 Thread Robert Haas
On Thu, Mar 8, 2012 at 9:39 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Dimitri, are you going to post an updated patch for this CF?

 Yes, I intend to do that.  Not sure about diverting from the command
 trigger patch while Thom is full speed on reviewing and helping me write
 the full covering test cases, though.

I don't think we can wait any longer for this; we're now more than two
months in to this CommitFest, and command triggers is still in full
swing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Andres Freund
On Wednesday, March 21, 2012 03:47:23 PM Alvaro Herrera wrote:
 Excerpts from Tom Lane's message of mié mar 21 11:35:54 -0300 2012:
  Now that would all be fine if this were a widely-desired feature, but
  AFAIR the user demand for it has been about nil.  So I'm leaning to
  the position that we don't want it.
 
 I disagree with there being zero interest ... the order by random()
 stuff does come up occasionally.
Yes.

I wonder if could be hacked ontop of a plain seqscan node instead of building 
a completely separate infrastructure. The standards syntax would then simply 
be transformed into a select with some special ORDER BY

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] renaming domain constraint

2012-03-21 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié mar 21 11:43:17 -0300 2012:
 On Fri, Mar 16, 2012 at 1:34 PM, Peter Eisentraut pete...@gmx.net wrote:
  Here is a patch for being able to rename constraints of domains.  It
  goes on top of the previously committed patch for renaming table
  constraints.
 
 I don't like the way you've modified get_constraint_oid(), which is
 currently parallel to many other get_whatever_oid() functions and with
 this patch, would no longer be.  There seems to be little point in
 shoehorning the new functionality into the existing function anyway,
 considering that you've conditionalized basically every piece of logic
 in the function.  I think you should just invent a completely separate
 function and be done with it.

get_relation_constraint_oid() plus get_domain_constraint_oid()?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] misleading error message from connectMaintenanceDatabase()

2012-03-21 Thread Robert Haas
On Tue, Feb 28, 2012 at 12:05 AM, Josh Kupershmidt schmi...@gmail.com wrote:
 A simple fix would be just to pass fail_ok = false for the last
 connectDatabase() call inside connectMaintenanceDatabase(), and give
 up on trying to tack on a likely-misleading hint about the maintenance
 database. Patch attached.

Committed.  Thanks for the report and patch; sorry for the delay.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-03-21 Thread Robert Haas
On Fri, Feb 24, 2012 at 2:23 PM, Ants Aasma ants.aa...@eesti.ee wrote:
 On Wed, Feb 22, 2012 at 6:35 PM, Ants Aasma ants.aa...@eesti.ee wrote:
 Some implementation notes.  This currently fails regression test
 create_function_3, haven't looked into why yet.

 I'll take a look at it.

 The failure was due to leakproof changes to pgproc. Attached patches
 are adjusted accordingly and rebased over Robert's blocks dirtied
 patch.

This seems to have bitrotted again.  :-(

Can you please rebase again?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Andrew Dunstan



On 03/21/2012 10:47 AM, Alvaro Herrera wrote:

Excerpts from Tom Lane's message of mié mar 21 11:35:54 -0300 2012:


Now that would all be fine if this were a widely-desired feature, but
AFAIR the user demand for it has been about nil.  So I'm leaning to
the position that we don't want it.

I disagree with there being zero interest ... the order by random()
stuff does come up occasionally.



Presumably the reason that's not good enough is that is scans the whole 
table (as well as being non-portable)? Maybe we could find some less 
invasive way of avoiding that.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PL/pgPSM for 9.3

2012-03-21 Thread Robert Haas
On Wed, Feb 22, 2012 at 4:19 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I propose to integrate a PSM language into the core. This language is
 defined as part of ANSI SQL - SQL/PSM and is used in some well known
 databases like DB2, Terradata and some other less known RDBMS like
 MonetDB. A The proposed implementation is based on the same
 architecture as the current PL/pgSQL interpreter - interpretation of
 AST with integration of an SQL parser and expression executor. Reasons
 why to use same architecture are: reuse some parts of interpreter and
 well experience with the current interpreter.

Why not just enhance PL/pgsql?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Finer Extension dependencies

2012-03-21 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I don't think we can wait any longer for this; we're now more than two
 months in to this CommitFest, and command triggers is still in full
 swing.

Is it possible to have another day to send out a revised patch?  The
problem reported is either a show stopper or a less-than-one-hour fix, I
would hate to miss 9.2 for having been swamped so much as to miss the
time to qualify the problem.

Baring objections, I'll send a new revision later tonight or tomorrow,
or a notification that the patch is really dead for 9.2.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 10:57 AM, Andres Freund and...@anarazel.de wrote:
 On Wednesday, March 21, 2012 03:47:23 PM Alvaro Herrera wrote:
 Excerpts from Tom Lane's message of mié mar 21 11:35:54 -0300 2012:
  Now that would all be fine if this were a widely-desired feature, but
  AFAIR the user demand for it has been about nil.  So I'm leaning to
  the position that we don't want it.

 I disagree with there being zero interest ... the order by random()
 stuff does come up occasionally.
 Yes.

 I wonder if could be hacked ontop of a plain seqscan node instead of building
 a completely separate infrastructure. The standards syntax would then simply
 be transformed into a select with some special ORDER BY

Well, the standard syntax apparently aims to reduce the number of
returned rows, which ORDER BY does not.  Maybe you could do it with
ORDER BY .. LIMIT, but the idea here I think is that we'd like to
sample the table without reading all of it first, so that seems to
miss the point.

I have to admit I'm not very impressed by the argument that we
shouldn't do this because we'll need a new executor node.  Creating a
new executor node is not really that big of a deal; and in any case I
don't think Tom will like hacking another bit of functionality into
seq-scan any better, since he refactored both the Merge Append and
Index-Only Scan patches to avoid doing exactly that, and those were
more similar than this probably would be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Finer Extension dependencies

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 11:11 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 I don't think we can wait any longer for this; we're now more than two
 months in to this CommitFest, and command triggers is still in full
 swing.

 Is it possible to have another day to send out a revised patch?  The
 problem reported is either a show stopper or a less-than-one-hour fix, I
 would hate to miss 9.2 for having been swamped so much as to miss the
 time to qualify the problem.

 Baring objections, I'll send a new revision later tonight or tomorrow,
 or a notification that the patch is really dead for 9.2.

Sounds reasonable to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 03/21/2012 10:47 AM, Alvaro Herrera wrote:
 I disagree with there being zero interest ... the order by random()
 stuff does come up occasionally.

 Presumably the reason that's not good enough is that is scans the whole 
 table (as well as being non-portable)?

The reason I'm concerned about the implementation effort is precisely
that I'm afraid people will have high expectations for the intelligence
of the feature.  If it's not materially better than you can get today
with order by random(), it's not worth doing.  That will mean for
example that it can't just be something we bolt onto seqscans and be
done with --- it'll need to interact with indexscans, maybe joins, etc
etc.  And no shortcuts on the quality of the sampling, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PL/pgPSM for 9.3

2012-03-21 Thread Pavel Stehule
Hello

2012/3/21 Robert Haas robertmh...@gmail.com:
 On Wed, Feb 22, 2012 at 4:19 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I propose to integrate a PSM language into the core. This language is
 defined as part of ANSI SQL - SQL/PSM and is used in some well known
 databases like DB2, Terradata and some other less known RDBMS like
 MonetDB. A The proposed implementation is based on the same
 architecture as the current PL/pgSQL interpreter - interpretation of
 AST with integration of an SQL parser and expression executor. Reasons
 why to use same architecture are: reuse some parts of interpreter and
 well experience with the current interpreter.

 Why not just enhance PL/pgsql

plpgsql is based on PL/SQL and it is ADA. PSM is relative unique
language - similar to modula with one significant feature - handlers -
so ADA and PSM are different languages with different syntax.

I implemented a few statements from PSM to plpgsql in last years. But
it is a maximum - we cannot change a syntax of declaration, handling
errors or some control statements.

The main issue is different exception handling in these languages - we
cannot to isolate differences just in parser.

Regards

Pavel

http://postgres.cz/wiki/Compound_statement








 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Well, the standard syntax apparently aims to reduce the number of
 returned rows, which ORDER BY does not.  Maybe you could do it with
 ORDER BY .. LIMIT, but the idea here I think is that we'd like to
 sample the table without reading all of it first, so that seems to
 miss the point.

I think actually the traditional locution is more like
WHERE random()  constant
where the constant is the fraction of the table you want.  And yeah,
the presumption is that you'd like it to not actually read every row.
(Though unless the sampling density is quite a bit less than 1 row
per page, it's not clear how much you're really going to win.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Qi Huang


 Date: Wed, 21 Mar 2012 11:00:59 -0400
 From: and...@dunslane.net
 To: alvhe...@commandprompt.com
 CC: t...@sss.pgh.pa.us; robertmh...@gmail.com; huangq...@hotmail.com; 
 neil.con...@gmail.com; dan...@heroku.com; j...@agliodbs.com; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
 
 
 
 On 03/21/2012 10:47 AM, Alvaro Herrera wrote:
  Excerpts from Tom Lane's message of mié mar 21 11:35:54 -0300 2012:
 
  Now that would all be fine if this were a widely-desired feature, but
  AFAIR the user demand for it has been about nil.  So I'm leaning to
  the position that we don't want it.
  I disagree with there being zero interest ... the order by random()
  stuff does come up occasionally.
 
 
 Presumably the reason that's not good enough is that is scans the whole 
 table (as well as being non-portable)? Maybe we could find some less 
 invasive way of avoiding that.
 
 cheers
 
 andrew


Thanks for your discussion and ideas. As I checked, MS SQL Server and DB2 
implemented tablesample for now. At least, it is useful for QUICK sample 
retrieval for large dataset. I suppose this clause itself will be much faster 
for using random().About implementation, will the code change be really very 
large? But the general structure should still be about the same, right? 
Best Regards and ThanksHuang Qi VictorComputer Science of National University 
of Singapore

Re: [HACKERS] Bug: walsender and high CPU usage

2012-03-21 Thread Heikki Linnakangas

On 12.03.2012 15:30, Fujii Masao wrote:

On Mon, Mar 12, 2012 at 10:27 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

On Mon, Mar 12, 2012 at 7:58 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

I also think we should backport commit
cff75130b5f63e45423c2ed90d6f2e84c21ef840, except for the removal of
wal_sender_delay).


Agreed. The attached patch is the same as
cff75130b5f63e45423c2ed90d6f2e84c21ef840,
except for the removal of wal_sender_delay. Could you apply this?


Oh, I forgot to attach the patch. Patch attached really.


Thanks, committed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 11:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Well, the standard syntax apparently aims to reduce the number of
 returned rows, which ORDER BY does not.  Maybe you could do it with
 ORDER BY .. LIMIT, but the idea here I think is that we'd like to
 sample the table without reading all of it first, so that seems to
 miss the point.

 I think actually the traditional locution is more like
        WHERE random()  constant
 where the constant is the fraction of the table you want.  And yeah,
 the presumption is that you'd like it to not actually read every row.
 (Though unless the sampling density is quite a bit less than 1 row
 per page, it's not clear how much you're really going to win.)

Well, there's something mighty tempting about having a way to say
just give me a random sample of the blocks and I'll worry about
whether that represents a random sample of the rows.

It's occurred to me a few times that it's pretty unfortunate you can't
do that with a TID condition.

rhaas=# explain select * from randomtext where ctid = '(500,1)' and
ctid  '(501,1)';
 QUERY PLAN

 Seq Scan on randomtext  (cost=0.00..111764.90 rows=25000 width=31)
   Filter: ((ctid = '(500,1)'::tid) AND (ctid  '(501,1)'::tid))
(2 rows)

The last time this came up for me was when I was trying to find which
row in a large table as making the SELECT blow up; but it seems like
it could be used to implement a poor man's sampling method, too... it
would be nicer, in either case, to be able to specify the block
numbers you'd like to be able to read, rather than bounding the CTID
from both ends as in the above example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-03-21 Thread Robert Haas
On Tue, Mar 20, 2012 at 12:02 PM, Andres Freund and...@anarazel.de wrote:
 The attached patch applies from 8.3 to 9.1 (8.2 has conflicts but
 thankfully...).

It seems it doesn't apply to master (any more?).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unnecessary WAL archiving after failover

2012-03-21 Thread Robert Haas
On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Hi,

 In streaming replication, after failover, new master might have lots
 of un-applied
 WAL files with old timeline ID. They are the WAL files which were recycled as 
 a
 future ones when the server was running as a standby. Since they will never be
 used later, they don't need to be archived after failover. But since they have
 neither .ready nor .done file in archive_status, checkpoints after
 failover newly
 create .reacy files for them, and then finally they are archived.
 Which might cause
 disk I/O spike both in WAL and archive storage.

 To avoid the above problem, I think that un-applied WAL files with old
 timeline ID
 should be marked as already-archived and recycled immediately at the end of
 recovery. Thought?

I'm not an expert on this, but that makes sense to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-03-21 Thread Andres Freund
On Wednesday, March 21, 2012 04:54:00 PM Robert Haas wrote:
 On Tue, Mar 20, 2012 at 12:02 PM, Andres Freund and...@anarazel.de wrote:
  The attached patch applies from 8.3 to 9.1 (8.2 has conflicts but
  thankfully...).
 
 It seems it doesn't apply to master (any more?).
Its not required there because of the unification of CTAS with normal code. 
Sorry, that was only clear from a few mails away, should have made that 
explicit.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Andrew Dunstan



On 03/21/2012 11:49 AM, Robert Haas wrote:

On Wed, Mar 21, 2012 at 11:34 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

Robert Haasrobertmh...@gmail.com  writes:

Well, the standard syntax apparently aims to reduce the number of
returned rows, which ORDER BY does not.  Maybe you could do it with
ORDER BY .. LIMIT, but the idea here I think is that we'd like to
sample the table without reading all of it first, so that seems to
miss the point.

I think actually the traditional locution is more like
WHERE random()  constant
where the constant is the fraction of the table you want.  And yeah,
the presumption is that you'd like it to not actually read every row.
(Though unless the sampling density is quite a bit less than 1 row
per page, it's not clear how much you're really going to win.)

Well, there's something mighty tempting about having a way to say
just give me a random sample of the blocks and I'll worry about
whether that represents a random sample of the rows.

It's occurred to me a few times that it's pretty unfortunate you can't
do that with a TID condition.

rhaas=# explain select * from randomtext where ctid= '(500,1)' and
ctid  '(501,1)';
  QUERY PLAN

  Seq Scan on randomtext  (cost=0.00..111764.90 rows=25000 width=31)
Filter: ((ctid= '(500,1)'::tid) AND (ctid  '(501,1)'::tid))
(2 rows)

The last time this came up for me was when I was trying to find which
row in a large table as making the SELECT blow up; but it seems like
it could be used to implement a poor man's sampling method, too... it
would be nicer, in either case, to be able to specify the block
numbers you'd like to be able to read, rather than bounding the CTID
from both ends as in the above example.



That would rapidly get unmanageable when you wanted lots of pages.


Maybe we could do something like a pagenum pseudovar, or a wildcard 
match for ctid against '(123,*)'.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Well, there's something mighty tempting about having a way to say
 just give me a random sample of the blocks and I'll worry about
 whether that represents a random sample of the rows.

 It's occurred to me a few times that it's pretty unfortunate you can't
 do that with a TID condition.

 rhaas=# explain select * from randomtext where ctid = '(500,1)' and
 ctid  '(501,1)';

Yeah, as you say that's come up more than once in data-recovery
situations.  It seems like it'd be just a SMOP to extend the tidscan
stuff to handle ranges.

Another thing that people sometimes wish for is joins using TIDs.
I think the latter would actually be pretty trivial to do now given the
parameterized-plan infrastructure; I'd hoped to look into it for 9.2 but
ran out of time...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)

2012-03-21 Thread Robert Haas
On Wed, Feb 29, 2012 at 10:18 AM, Michael Tautschnig m...@debian.org wrote:
 In [3] it was suggested to fix the problem by placing a barrier in ResetLatch,
 which corresponds to placing it between lines 11 and 12 in the code above.  
 This
 amounts to placing a barrier between the two reads (lines 7/19 and 12, i.e.,
 between WaitLatch and the if(flag[1]) ) of Worker 1.

 Placing a sync (i.e., the strongest Power barrier) accordingly would, however,
 still be insufficient for the second problem, as it would only fix the
 reordering of read-read pairs by Worker 1 and the store atomicity issue from
 Worker 0. But the writes on Worker 0 could still be reordered (problem number
 2). One possible fix consists of placing a sync between the two writes on 
 Worker
 0, and an address dependency between the two reads on Worker 1. Clearly,
 however, these are changes that cannot any longer be hidden behind the
 ResetLatch/WaitLatch interface, but rather go in the code using these.

Well, part of my skepticism about Tom's proposal to include memory
barrier instructions in the latch primitives was the fear that
something like what you're suggesting here might be true: namely, that
it might create the illusion of safety for people using the
primitives, when reality thought and possibly additional barrier
instructions might still be needed.

However, your example is enough unlike the actual code that the
conclusion you state following the word clearly isn't actually clear
to me.  According to latch.h, the correct method of using a latch is
like this:

 * for (;;)
 * {
 * ResetLatch();
 * if (work to do)
 * Do Stuff();
 * WaitLatch();
 * }

Meanwhile, anyone who is creating additional work to do should add the
work to the queue and then set the latch.

So it seems to me that we could potentially fix this by inserting
barriers at the end of ResetLatch and at the beginning of SetLatch and
WaitLatch.  Then the latch has to get reset before we check whether
there's work to do; and we've got to finish checking for work before
we again try to wait for the latch.  Similarly, any work that was in
progress before SetLatch was called will be forced to be committed to
memory before SetLatch does anything else.  Adding that many barriers
might not be very good for performance but it seems OK from a
correctness point of view, unless I am missing something, which is
definitely possible.  I'd appreciate any thoughts you have on this, as
this is clearly subtle and tricky to get exactly right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 11:58 AM, Andres Freund and...@anarazel.de wrote:
 On Wednesday, March 21, 2012 04:54:00 PM Robert Haas wrote:
 On Tue, Mar 20, 2012 at 12:02 PM, Andres Freund and...@anarazel.de wrote:
  The attached patch applies from 8.3 to 9.1 (8.2 has conflicts but
  thankfully...).

 It seems it doesn't apply to master (any more?).
 Its not required there because of the unification of CTAS with normal code.
 Sorry, that was only clear from a few mails away, should have made that
 explicit.

Or maybe I should have tested it before jumping to conclusions.

Anyway, I've committed the patch to 9.1, 9.0, 8.4, and 8.3.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Trivial libpq refactoring patch (was: Re: [HACKERS] Another review of URI for libpq, v7 submission)

2012-03-21 Thread Alex Shulgin
Alex a...@commandprompt.com writes:

 Marko Kreen mark...@gmail.com writes:

 On Thu, Mar 15, 2012 at 11:29:31PM +0200, Alex wrote:
 https://github.com/a1exsh/postgres/commits/uri

 The point of the patch is to have one string with all connection options,
 in standard format, yes?  So why does not this work:

   db = PQconnectdb(postgres://localhost);

 ?

 Good catch.

 I've figured out that we'll need a bit more intrusive change than simply
 overriding the expand_dbname check in conninfo_array_parse (like the
 current version does) to support URIs in all PQconnect* variants.

 I still need to figure out some details, but this is to give people a
 status update.

While working on this fix I've figured out that I need my
conninfo_uri_parse to support use_defaults parameter, like
conninfo(_array)_parse functions do.

The problem is that the block of code which does the defaults handling
is duplicated in both of the above functions.  What I'd like to do is
extract it into a separate function to call.  What I wouldn't like is
bloating the original URI patch with this unrelated change.

So here's a trivial patch to do the refactoring.  Also, it uses the
newly added conninfo_fill_defaults directly in PQconndefaults, instead
of doing the parse empty conninfo string trick.  If this could be
applied, I'd rebase my patch against the updated master branch and
submit the new version.

As it goes, the patch adds a little duplication when it comes to
creating a working copy of PQconninfoOptions array.  Attached is the
second patch on top of the first one to extract this bit of code into
conninfo_init.  Not sure if we should go all the way through this, so
it's not critical if this one is not applied.

--
Regards,
Alex
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***
*** 297,302  static PQconninfoOption *conninfo_parse(const char *conninfo,
--- 297,304 
  static PQconninfoOption *conninfo_array_parse(const char *const * keywords,
 const char *const * values, 
PQExpBuffer errorMessage,
 bool use_defaults, int expand_dbname);
+ static bool conninfo_fill_defaults(PQconninfoOption *options,
+  PQExpBuffer errorMessage);
  static char *conninfo_getval(PQconninfoOption *connOptions,
const char *keyword);
  static void defaultNoticeReceiver(void *arg, const PGresult *res);
***
*** 836,842  PQconndefaults(void)
initPQExpBuffer(errorBuf);
if (PQExpBufferDataBroken(errorBuf))
return NULL;/* out of memory already :-( */
!   connOptions = conninfo_parse(, errorBuf, true);
termPQExpBuffer(errorBuf);
return connOptions;
  }
--- 838,860 
initPQExpBuffer(errorBuf);
if (PQExpBufferDataBroken(errorBuf))
return NULL;/* out of memory already :-( */
! 
!   /* Make a working copy of PQconninfoOptions */
!   connOptions = malloc(sizeof(PQconninfoOptions));
!   if (connOptions == NULL)
!   {
!   printfPQExpBuffer(errorBuf,
! libpq_gettext(out of 
memory\n));
!   return NULL;
!   }
!   memcpy(connOptions, PQconninfoOptions, sizeof(PQconninfoOptions));
! 
!   if (!conninfo_fill_defaults(connOptions, errorBuf))
!   {
!   PQconninfoFree(connOptions);
!   connOptions = NULL;
!   }
! 
termPQExpBuffer(errorBuf);
return connOptions;
  }
***
*** 4002,4008  conninfo_parse(const char *conninfo, PQExpBuffer 
errorMessage,
char   *pname;
char   *pval;
char   *buf;
-   char   *tmp;
char   *cp;
char   *cp2;
PQconninfoOption *options;
--- 4020,4025 
***
*** 4170,4245  conninfo_parse(const char *conninfo, PQExpBuffer 
errorMessage,
free(buf);
  
/*
!* Stop here if caller doesn't want defaults filled in.
!*/
!   if (!use_defaults)
!   return options;
! 
!   /*
!* If there's a service spec, use it to obtain any not-explicitly-given
!* parameters.
 */
!   if (parseServiceInfo(options, errorMessage))
{
PQconninfoFree(options);
return NULL;
}
  
-   /*
-* Get the fallback resources for parameters not specified in the 
conninfo
-* string nor the service.
-*/
-   for (option = options; option-keyword != NULL; option++)
-   {
-   if (option-val != NULL)
-   continue;   /* Value was in 
conninfo or service */
- 
-   /*
-* Try to get the environment variable fallback
-*/
-   if (option-envvar != NULL)
-   {

Re: [HACKERS] cache lookup failed in plpgsql - reason?

2012-03-21 Thread Robert Haas
On Wed, Oct 26, 2011 at 10:37 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Oct 25 20:13:44  db-s-01 postgres: local5.warning -- postgres[29970]:
 [3-1] 2011-10-25 20:13:44 CEST adifd 29970 ERROR:  cache lookup failed
 for relation 319883311
 Oct 25 20:13:44  db-s-01 postgres: local5.warning -- postgres[29970]:
 [3-2] 2011-10-25 20:13:44 CEST adifd 29970 CONTEXT:  SQL statement
 SELECT  NOT EXISTS( SELECT relname FROM pg_class WHERE relname =

This appears to be a race condition in pg_table_is_visible.  It checks
whether the table exists; if it does not, it returns NULL.  Having
verified that the relation exists, it then calls RelationIsVisible(),
which then does a new syscache lookup for the same tuple, throwing an
error if none is found.  But there's no guarantee that things can't
change between the first test and the second one, so you get this
error.

Perhaps we could make RelationInVisible return a three-valued enum
type, rather than bool.   VISIBLE_YES, VISIBLE_NO, and
VISIBLE_NOTFOUND, or something like that.  I wouldn't want to
back-patch such a fix, since there could be third-party code calling
RelationIsVisible, but we could do it in master.  Sadly it's not a
trivial patch, since there are a gazillion WhateverIsVisible()
functions and we'd have to fix them all, but at least it's mostly
mechanical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 6:07 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 The reason why dynamic domain transition should be configured
 carefully is that it partially allows users to switch their own privileges
 in discretionary way, unlike trusted procedure.

 The original model of selinux on operating system assumes all the
 domain transition shall happen on execve(2) time, but it made clear
 some sort of application is not happy with traditional fork - exec
 lifecycle, such as web server, connection pooling software, or others.

 Even as they perform according to the operations from users,
 it does not fork - exec itself because of some reason, typically
 performance. One point we should focus on is these applications
 have relatively trustable portion and untrustable one.

 The dynamic domain transition was designed to restrict privileges
 more than the current one on the trustable portion, prior to launch
 untrustable one. So, it never intend to switch client domain with
 100% arbitrary. Its bottom line is restricted with the security policy;
 that explicitly describes the range of domains being allowed to
 translate.

 So, we will be able to conclude dynamic domain transition is
 harmless as long as it works to reduce privileges; that should
 be guaranteed with the security policy.
 It also means sepgsql_setcon() is harmless as long as it works
 according to the decision of SELinux.

 The connection pooling software scenario using trusted procedure
 might be a bit confusing. In this case, the client domain is once
 switched to the trusted one with mandatory way, then it switches
 to more restricted domain in arbitrary way; thus, it is not allowed
 to promote its privileges in arbitrary way.
 We assume the trusted procedure is a enough small portion to
 ensure bug or vulnerability free.

 Joshua, please add some comments, if you have.

I guess my feeling on this is that the warning in the documentation
isn't really helping anything here.  I mean, we don't need to document
that allowing people to give themselves more privileges is a security
hole; that much is obvious.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reconstructing Insert queries with indirection

2012-03-21 Thread Tom Lane
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 Consider following sequence of commands

 create type complex as (r float8, i float8);
 create type quad as (c1 complex, c2 complex);
 create temp table quadtable(f1 int, q quad);

 insert into quadtable (f1, q.c1.r, q.c2.i) values(44,55,66);

 While parsing the INSERT query, we parse the query with three columns and
 three values in the target list, but during rewriting we combine q.c1.r and
 q.c2.i into a single column in the form of FieldStore structure. In
 Postgres-XC, we deparse these parse trees, to be sent to other PostgreSQL
 servers.

Well, basically you have a broken design there.  We are not going to
adopt a restriction that post-rewrite trees are necessarily exactly
representable as SQL, so there are going to be corner cases where this
approach fails.

 The assertion is added by commit 858d1699. The notes for the commit have
 following paragraph related to FieldStore deparsing.

 I chose to represent an assignment ArrayRef as array[subscripts] :=
 source,
 which is fairly reasonable and doesn't omit any information.  However,
 FieldStore is problematic because the planner will fold multiple
 assignments
 to fields of the same composite column into one FieldStore, resulting
 in a
 structure that is hard to understand at all, let alone display
 comprehensibly.
 So in that case I punted and just made it print the source
 expression(s).

 So, there doesn't seem to be any serious reason behind the restriction.

If you have a proposal for some reasonable way to print the actual
meaning of the expression (and a patch to do it), we can certainly
consider changing that code.  I don't think it's possible to display it
as standard SQL, though.  The ArrayRef case is already not standard SQL.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] checkpoint patches

2012-03-21 Thread Robert Haas
There are two checkpoint-related patches in this CommitFest that
haven't gotten much love, one from me and the other from Greg Smith:

https://commitfest.postgresql.org/action/patch_view?id=752
https://commitfest.postgresql.org/action/patch_view?id=795

Mine uses sync_file_range() when available (i.e. on Linux) to add the
already-dirty data to the kernel writeback queue at the beginning of
each checkpoint, in the hopes of reducing the tendency of checkpoints
to disrupt other activity on the system.  Greg's adds an optional
pause after each fsync() call for similar purposes.  What we're
lacking is any testimony to the effectiveness or ineffectiveness of
either approach.  I took a shot at trying to figure this out by
throwing pgbench at it, but didn't get too far.  Here's scale factor
300, which fits in shared_buffers, on the IBM POWER7 machine:

resultsckpt.checkpoint-sync-pause-v1.1:tps = 14274.784431 (including
connections establishing)
resultsckpt.checkpoint-sync-pause-v1.2:tps = 12114.861879 (including
connections establishing)
resultsckpt.checkpoint-sync-pause-v1.3:tps = 14117.602996 (including
connections establishing)
resultsckpt.master.1:tps = 14485.394298 (including connections establishing)
resultsckpt.master.2:tps = 14162.000100 (including connections establishing)
resultsckpt.master.3:tps = 14307.221824 (including connections establishing)
resultsckpt.writeback-v1.1:tps = 14264.851218 (including connections
establishing)
resultsckpt.writeback-v1.2:tps = 14314.773839 (including connections
establishing)
resultsckpt.writeback-v1.3:tps = 14230.219900 (including connections
establishing)

Looks like a whole lot of that didn't matter.  Of course then I
realized that it was a stupid test, since if the whole database fits
in shared_buffers then of course there won't be any data in the OS at
checkpoint start time.  So I ran some more tests with scale factor
1000, which doesn't fit in shared_buffers.  Unfortunately an operating
system crash intervened before the test finished, but it still looks
like a whole lot of nothing:

resultsckpt.checkpoint-sync-pause-v1.4:tps = 1899.745078 (including
connections establishing)
resultsckpt.checkpoint-sync-pause-v1.5:tps = 1925.848571 (including
connections establishing)
resultsckpt.checkpoint-sync-pause-v1.6:tps = 1920.624753 (including
connections establishing)
resultsckpt.master.4:tps = 1855.866476 (including connections establishing)
resultsckpt.master.5:tps = 1862.413311 (including connections establishing)
resultsckpt.writeback-v1.4:tps = 1869.536435 (including connections
establishing)
resultsckpt.writeback-v1.5:tps = 1912.669580 (including connections
establishing)

There might be a bit of improvement there with the patches, but it
doesn't look like very much, and then you also have to think about the
fact that they work by making checkpoints take longer, and therefore
potentially less frequent, especially in the case of the
checkpoint-sync-pause patch.  Of course, this is maybe all not very
surprising, since Greg already spent some time talking about the sorts
of conditions he thought were needed to replicate his test, and
they're more complicated than throwing transactions at the database at
top speed.  I don't know how to replicate those conditions, though,
and there's certainly plenty of checkpoint-related latency to be
quashed even on this test - a problem which these patches apparently
do little if anything to address.

So my feeling is that it's premature to change anything here and we
should punt any changes in this area to 9.3.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoint patches

2012-03-21 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 Thoughts?

It was my impression that these patches were much about improving
overall tps and more about reducing latency spikes for specific
transactions that get hit by a checkpoint happening at a bad time.

Are there any reductions in max latency for these pgbench runs..?
Assuming that's information you can get..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] checkpoint patches

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 3:34 PM, Stephen Frost sfr...@snowman.net wrote:
 Robert,

 * Robert Haas (robertmh...@gmail.com) wrote:
 Thoughts?

 It was my impression that these patches were much about improving
 overall tps and more about reducing latency spikes for specific
 transactions that get hit by a checkpoint happening at a bad time.

 Are there any reductions in max latency for these pgbench runs..?
 Assuming that's information you can get..

It looks like I neglected to record that information for the last set
of runs.  But I can try another set of runs and gather that
information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lots of unused variable warnings in assert-free builds

2012-03-21 Thread Peter Eisentraut
On tis, 2012-03-20 at 15:04 -0400, Tom Lane wrote:
 Hm.  I bet it thinks that PG_USED_FOR_ASSERTS_ONLY is the variable
 name, which means that the behavior might be more exciting for
 multi-word type names (for instance struct foo or volatile int *.
 Could you check a few cases like that?

Tested, doesn't make a difference.  Hence committed that way.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] HOT updates REDIRECT line pointers

2012-03-21 Thread Robert Haas
When the root tuple of a HOT chain is dead, but there's still at least
one non-dead member of the chain, we end up with a REDIRECT line
pointer, which points to a USED line pointer, which in turn points to
a live tuple.  This means we're using 2 line pointers for only 1 line
tuple.  Since line pointers are fairly small, that's not a
catastrophe, but I wonder if it might be possible to do better.

Specifically, I'm wondering if we couldn't get away with rearranging
things so that the root line pointer (which has index entries) points
to the actual tuple, and the other line pointer (which can't have any
index entries) gets marked UNUSED.  In other words, we essentially
bequeath the live tuple that is in effect the current root of the
chain to the original line pointer, and then recycle the line pointer
that formerly referenced that tuple.

Now, the question is, is this safe?  Could someone, for example,
release a pin on the page in the middle of walking a HOT chain and
then reacquire the pin to walk the rest of the chain?  If they did,
they might miss a visible tuple altogether.  I think we typically keep
the pin until we're done with the page in such situations, in which
case it might be safe.  But I also think we've typically viewed that
as a performance optimization rather than as something that's
necessary for correctness.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Create index on foreign table

2012-03-21 Thread Greg Stark
On Wed, Mar 21, 2012 at 2:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, above Etsuro-san is proposing the other case, ie a Postgres index
 definition for an index *not* stored in the database.  But frankly
 I think both ideas are pretty bad.  There's no reason to expect that
 Postgres' model of an index will accurately describe the facilities
 available in a remote server; and if it's not accurate, is it really
 of any use for planning?

Surely this will ultimately be necessary though? Some of the other use
cases for fdw are things like federated databases spread over multiple
postgres databases or even other SQL databases. They won't be very
practical if they can only implement sequential scans.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HOT updates REDIRECT line pointers

2012-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Specifically, I'm wondering if we couldn't get away with rearranging
 things so that the root line pointer (which has index entries) points
 to the actual tuple, and the other line pointer (which can't have any
 index entries) gets marked UNUSED.

This would amount to changing the TID of the live row.  In the past we
have considered that that can only be allowed when holding an exclusive
lock on the table (a la old-style VACUUM FULL).  An example of the sort
of situation where it'd be dangerous is an UPDATE involving a join,
which might read a tuple-to-be-updated (including its TID), then release
pin on that page for awhile while going about its business with the
join, and eventually expect to come back and find the tuple still at the
same TID.  I believe there are applications that similarly expect a TID
that they've fetched to remain valid for as long as they're holding some
type of lock on the table or row.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HOT updates REDIRECT line pointers

2012-03-21 Thread Tom Lane
I wrote:
 Robert Haas robertmh...@gmail.com writes:
 Specifically, I'm wondering if we couldn't get away with rearranging
 things so that the root line pointer (which has index entries) points
 to the actual tuple, and the other line pointer (which can't have any
 index entries) gets marked UNUSED.

 This would amount to changing the TID of the live row.

Another issue, quite independent from race conditions against other
observers of the row, is what if the tuple is part of an update chain?
You have no way to find the predecessor row version and update its
t_ctid forward link.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Create index on foreign table

2012-03-21 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Wed, Mar 21, 2012 at 2:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, above Etsuro-san is proposing the other case, ie a Postgres index
 definition for an index *not* stored in the database.  But frankly
 I think both ideas are pretty bad.  There's no reason to expect that
 Postgres' model of an index will accurately describe the facilities
 available in a remote server; and if it's not accurate, is it really
 of any use for planning?

 Surely this will ultimately be necessary though?

At the moment, whether a foreign table has indexes and when to use them
is strictly a private matter between the FDW and the remote server (if
any).  The rest of the system doesn't have any need-to-know whatsoever,
and I don't foresee that it ever will, at least not without drastic
redesign of the FDW API.

What's at stake in the current discussion is whether it would be
advantageous to an FDW if we were to store some information about
remote indexes in the local catalogs.  It would still be the FDW's
responsibility, and nobody else's, to make use of that information.
I can believe that we might eventually decide to do that; but I do not
think we have enough experience with different sorts of FDWs to define
a good solution today.  And I think that most likely a good solution
will *not* conflate such remote-index information with local indexes.

So basically my reaction to Etsuro-san's proposal is this is
premature.  I think he should be hacking together some FDW-private
facilities for individual FDWs instead (with the full understanding
that these might be throwaway prototypes), and then looking for a
common abstraction after he's done a few of those.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HOT updates REDIRECT line pointers

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Robert Haas robertmh...@gmail.com writes:
 Specifically, I'm wondering if we couldn't get away with rearranging
 things so that the root line pointer (which has index entries) points
 to the actual tuple, and the other line pointer (which can't have any
 index entries) gets marked UNUSED.

 This would amount to changing the TID of the live row.

 Another issue, quite independent from race conditions against other
 observers of the row, is what if the tuple is part of an update chain?
 You have no way to find the predecessor row version and update its
 t_ctid forward link.

I don't see why I need to.  The predecessor, if any, points to the
root of the HOT chain; and that's exactly the TID that I'm proposing
to keep around.  The heap-only tuple's TID gets canned, but nobody can
be pointing to that from outside the block, IIUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HOT updates REDIRECT line pointers

2012-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Mar 21, 2012 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Another issue, quite independent from race conditions against other
 observers of the row, is what if the tuple is part of an update chain?
 You have no way to find the predecessor row version and update its
 t_ctid forward link.

 I don't see why I need to.  The predecessor, if any, points to the
 root of the HOT chain;

Oh, right.  So scratch that objection.  The other one is still fatal
though ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] heap_freeze_tuple locking requirements

2012-03-21 Thread Robert Haas
heap_freeze_tuple() was apparently designed at one point to cope with
being called with either a shared or exclusive buffer lock.  But none
of the current callers call it with a shared lock; they all call it
with an exclusive lock, except for the heap-rewrite code which doesn't
take (or need) a lock at all.  But, meanwhile, the comments for that
function claim that vacuum does call it that way, which is incorrect,
and AFAICT has been at least since HOT went in.  Attached is a
proposed patch that cleans all of this up by removing support for
calling it with just a shared lock and removing or adjusting various
related comments.

Since this is just dead code removal, I propose to apply this to 9.2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


heap-freeze-exclusive-only.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-21 Thread Jeff Janes
On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas robertmh...@gmail.com wrote:
...

 One thing that seems inefficient to me about our current algorithm is
 the use of the run number as a leading column in the sort key.
 There's no real reason why the tuples destined for the next run need
 to be maintained in heap order; we could just store them unordered and
 heapify the whole lot of them when it's time to start the next run.
 That ought to save comparison cycles for the current run, since the
 heap will be less deep, and there are other possible savings as well -
 in particular, an unordered array of tuples can be heapify'd in linear
 time, whereas our current algorithm is O(n lg n).  However, when I
 actually implemented this, I found that doing it this way was a loser.
  In fact, excluding the next-run tuples from the heap for the current
 run was a BIG loser even before you add in the time required to
 re-heapify.  This confused the daylights out of me for a while,
 because it's hard to understand how insert and siftup can be slower on
 a small heap than a large one.

 My working theory is that, even though we must necessarily do more
 comparisons when manipulating a larger heap, many of those comparisons
 are resolved by comparing run numbers, and that's a lot cheaper than
 invoking the real comparator.  For example, if we insert into a heap
 whose rightmost child is in the next run, and the new tuple goes into
 the current run, and the current size of the heap is such that the
 initial position of the new element is a descendent of the right node,
 it will very quickly crash through all of the next-run tuples and only
 need one REAL comparison - against the root.  Either the new element
 ends up as the root, or it ends up as the direct child of the root;
 now we remove the root and, perhaps, replace it with its rightmost
 child, meaning that the next element we read in can do the exact same
 thing all over again.  If we exclude the next-run elements from the
 heap, then the average depth of the heap when inserting a new element
 is smaller, but all the elements are in the same-run, and we have to
 invoke the real comparator every time.  In other words, those next-run
 tuples act as dummies which effectively create a heap of uneven depth,
 and the average depth encountered when inserting tuples turns out to
 be less than what we get by pulling out the dummies and making the
 depth uniform.

 This makes me kind of nervous, because I don't understand why things
 should work out so well as all that.  If throwing some dummy elements
 into the heap makes things work better, then maybe we should throw in
 some more.  Or maybe it would work better to take some but not all of
 them out.  There certainly doesn't seem to be any indication in the
 code that this is an anticipated effect, and it seems an awfully
 providential accident.

Is there a patch or a git repo available for this change?  I'd like to
see if I can analyze that if I get some spare time.

...

 It turns out that it's possible to reduce the number of comparisons
 required to reinsert the last heap element from 2*heap_depth to
 heap_depth+lg(heap_depth).  At each node, compare the two children,
 and then let the current node be the lesser of those.  When you reach
 a leaf node, you've walked a path of length heap_depth which is known
 to be sorted (since we've got a heap), so you can find the correct
 reinsertion position by binary searching that path.  In the case of
 the sorted data mentioned above, this reduces the time to 19.45
 seconds with work_mem = 128MB and 16.12 seconds with work_mem = 1MB.
 However, in other cases, it seems not to help at all, or even be a
 slight regression.

Clever.  Rather than doing a binary search of the path, it might make
sense to do a reverse search of it.  The tuple is likely to send up
somewhere at the bottom of the heap, both because that is where most
of the data is, and because the tuple being reinserted just came from
the bottom so it is likely to be biased that way.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HOT updates REDIRECT line pointers

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 8:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh, right.  So scratch that objection.  The other one is still fatal
 though ...

So, could we just decide that we don't care about preserving that
property any more, and document it as an incompatibility in whatever
release we break it in?  It strikes me that it likely wouldn't be any
worse than, oh, say, flipping the default value of
standard_conforming_strings, and we could even have a backward
compatibility GUC if we were so inclined.  I realize that the
standard_conforming_strings change was dictated by a desire to conform
to the SQL standards, and this isn't, but it seems awfully painful to
me to insist that this is a property that we can never give up.

I can remember one other proposal to which you raised this same
objection: the idea of an on-line tuple mover to handle the situation
where a user wishes to do an on-line reorganization of a bloated table
by incrementally moving tuples to lower-numbered pages.  It's possible
that the idea with which I started this thread might turn out to be
horribly unsafe for one reason or another, but I think that there's
broad support for the tuple mover concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PL/pgPSM for 9.3

2012-03-21 Thread Josh Berkus

 Why not just enhance PL/pgsql
 
 plpgsql is based on PL/SQL and it is ADA. PSM is relative unique
 language - similar to modula with one significant feature - handlers -
 so ADA and PSM are different languages with different syntax.

Also, PSM is implemented by MySQL and DB2, so having it helps people
migrate applications.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HOT updates REDIRECT line pointers

2012-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Mar 21, 2012 at 8:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh, right.  So scratch that objection.  The other one is still fatal
 though ...

 So, could we just decide that we don't care about preserving that
 property any more, and document it as an incompatibility in whatever
 release we break it in?

No, I don't think so.  Especially not for such a picayune benefit as
getting rid of one item pointer a bit sooner.

 It strikes me that it likely wouldn't be any
 worse than, oh, say, flipping the default value of
 standard_conforming_strings,

Really?  It's taking away functionality and not supplying any substitute
(or at least you did not propose any).  In fact, you didn't even suggest
exactly how you propose to not break joined UPDATE/DELETE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HOT updates REDIRECT line pointers

2012-03-21 Thread Robert Haas
On Wed, Mar 21, 2012 at 9:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It strikes me that it likely wouldn't be any
 worse than, oh, say, flipping the default value of
 standard_conforming_strings,

 Really?  It's taking away functionality and not supplying any substitute
 (or at least you did not propose any).  In fact, you didn't even suggest
 exactly how you propose to not break joined UPDATE/DELETE.

Oh, hmm, interesting.  I had been thinking that you were talking about
a case where *user code* was relying on the semantics of the TID,
which has always struck me as an implementation detail that users
probably shouldn't get too attached to.  But now I see that you're
talking about something much more basic - the fundamental
implementation of UPDATE and DELETE relies on the TID not changing
under them.  That pretty much kills this idea dead in the water.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-21 Thread Fujii Masao
On Wed, Mar 21, 2012 at 10:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 ... although none of the issues alone is a show-stopper, but considering
 all these things together, I'm starting to feel that this needs to be
 pushed to 9.3. Thoughts?

 Agreed.  In particular, I think you are right that it'd be prudent to
 simplify the WAL-location arithmetic and then rebase this code onto
 that.  And since no code at all has been written for the arithmetic
 change, I think we have to consider that it's not 9.2 material.

Agreed.

BTW, the patch changes some functions so that they use volatile pointer,
as follows:

@@ -8448,7 +9232,7 @@ XLogReportParameters(void)
 void
 UpdateFullPageWrites(void)
 {
-   XLogCtlInsert *Insert = XLogCtl-Insert;
+   volatile XLogCtlInsert *Insert = XLogCtl-Insert;

These changes should be applied?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-21 Thread Atri Sharma


-Original Message-
From: Atri Sharma [mailto:atri.j...@gmail.com] 
Sent: 22 March 2012 09:20
To: 'Atri Sharma'
Subject: RE: [HACKERS] Regarding column reordering project for GSoc 2012



-Original Message-
From: Atri Sharma [mailto:atri.j...@gmail.com] 
Sent: 20 March 2012 23:05
To: 'Merlin Moncure'
Cc: 'Heikki Linnakangas'; 'Alvaro Herrera'; 'Daniel Farina'; 'Andrew
Dunstan'; 'Dave Page'; 'Pg Hackers'
Subject: RE: [HACKERS] Regarding column reordering project for GSoc 2012



-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: 20 March 2012 22:50
To: Atri Sharma
Cc: Heikki Linnakangas; Alvaro Herrera; Daniel Farina; Andrew Dunstan; Dave
Page; Pg Hackers
Subject: Re: [HACKERS] Regarding column reordering project for GSoc 2012

On Tue, Mar 20, 2012 at 11:57 AM, Atri Sharma atri.j...@gmail.com wrote:


 -Original Message-
 From: Merlin Moncure [mailto:mmonc...@gmail.com]
 Sent: 20 March 2012 20:52
 To: Atri Sharma
 Cc: Heikki Linnakangas; Alvaro Herrera; Daniel Farina; Andrew Dunstan;
Dave
 Page; Pg Hackers
 Subject: Re: [HACKERS] Regarding column reordering project for GSoc 2012

 On Tue, Mar 20, 2012 at 6:58 AM, Atri Sharma atri.j...@gmail.com wrote:
 -Original Message-
 From: Merlin Moncure [mailto:mmonc...@gmail.com]
 Sent: 20 March 2012 03:15
 To: Heikki Linnakangas
 Cc: Alvaro Herrera; Atri Sharma; Daniel Farina; Andrew Dunstan; Dave
Page;
 Pg Hackers
 Subject: Re: [HACKERS] Regarding column reordering project for GSoc 2012

 On Mon, Mar 19, 2012 at 2:49 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 19.03.2012 21:29, Alvaro Herrera wrote:


 Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012:

 I was just going through PGfoundry and I think I will be able to work
 on the PL/Java project.Please let me know If I can proceed further on
 ot for GSOC and also,if currently there is any work that needs to be
 done on PL/java.


 Hm, I'm not sure on PL/Java.  As far as I know, it is a dead project.


 I haven't followed PL/java either, but it doesn't look quite dead to me.
 The
 last release was in September 2011, and there are active discussions on
 the
 mailing lists. If it's not completely up-to-date with latest JDK and
 PostgreSQL releases, perhaps bringing it up-to-date and then adding some
 missing functionality would be a useful GSoC project.

 I would suggest that you ask on the pl/java mailing list if there is
 something suitable for a GSoC project there, and if one of the pl/java
 developers would be willing to mentor.

 pl/java works pretty well and is somewhat widely used although it
 might need some more active maintainers.  just jumping into the
 project and nailing some old bugs and getting the juices flowing would
 be a tremendous help, as well as the less glamorous work of
 documentation and regular status updates.

 one pl/java based project that IMO is just screaming to be done is a
 pl/java based FDW (wrapping JDBC) that would approximately reproduce
 dblink...maybe with some extra features geared towards ETL type jobs
 like a row based callback for transformations in java.

 Merlin


 ---
 Hi Merlin

 Thanks for the idea.

 Could you please elaborate more on the project idea? It would be a great
 help for me.

 sure:
 first start here: http://wiki.postgresql.org/wiki/Foreign_data_wrappers
 and take a look at all the various fdw projects.  they all utilize the
 high level interfaces that postgresql provides to incorporate external
 datasources and expose them you can query them as if they were views
 or tables.  as you can see, this interface is popular with the
 proliferation of projects to expose this or that.  what i'm thinking
 is you make a fdw that invokes pl/java routines that make jdbc calls
 out to external databases.  for fetching data and building sets.  as
 long as you stick to vanilla jdbc calls, you then have a fdw that can
 gather data from pretty much anything you can access via jdbc url
 which is quite nice.

 The fdw API is a C api, so you need to figure out how to proxy to your
 pl/java calls where the real work is done -- I  haven't done that so
I'm not sure if this is even technically feasible.  If you can make
 SPI calls from your fdw routines, that's one path through.   Tou have
 lots of previous work to crib from though and hopefully there's a path
 through.  As far as the ETL stuff I was talking about -- that was just
 me thinking out loud...better to just get the basic mechanics working
 up front..  For now, if you're interested in doing this, start
 thinking about how to marry the fdw interface to pl/java...one you get
 'hello world' there, it's all about where you want tot take it.

 This is an interesting project -- you should entertain no illusions of
 it making it into core, but if done right and completed it will raise
 visibility of both pl/java and postgresql.

 Merlin


 
-
Hi Merlin,

Please let me know how