date:20121119

Re: [HACKERS] 9.3 pg_archivecleanup broken?

2012-11-19 Thread Heikki Linnakangas


On 19.11.2012 02:17, Fujii Masao wrote:

This bug is derived from the commit d5497b95f3ca2fc50c6eef46d3394ab6e6855956.
This commit changed ExecuteRecoveryCommand() so that it calculates the
the last valid
retart file by using GetOldestRestartPoint(), even though
GetOldestRestartPoint() only
works in the startup process and only while WAL replay is in progress
(i.e., InRedo = true).
In archive_cleanup_command, ExecuteRecoveryCommand() is executed by the
checkpointer process, so the problem happened.


Fixed. Thanks for the diagnosis!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gset updated patch

2012-11-19 Thread Pavel Stehule

Hello

2012/11/16 Karl O. Pinc k...@meme.com:
 Hi Pavel,

 On 11/16/2012 12:21:11 AM, Pavel Stehule wrote:
 2012/11/16 Karl O. Pinc k...@meme.com:

  As long as I'm talking crazy talk, why not
  abandon psql as a shell language and instead make a
  pl/pgsql interpreter with readlne() in front
  of it?   Solve all these language-related
  issues by using an actual programming language.  :-)

 I though about it more time, but I don't thinking so this has a
 sense.
 Actually we cannot do perfect autocomplete for significantly simpler
 SQL and there are lot of client side interprets - is not reason for
 next one. I use psql together bash and it works well. But just very
 simple task as storing some volatile data for repetitive usage is
 relative laborious and it is a motivation for this patch. In psql I
 can simply work with any fields of returned record - what is more
 terrible work outside psql

 You might consider using do.

it is reason, why I don't thinking about plpgsql on client side. But I
don't understand how it is related to gset ?

I remember, there is one significant limit of DO statement - it cannot
return table - so it cannot substitute psql simple scripts. But I
don't would open this topic now - it is related to real stored
procedures implementation, and it is long time task. So gset allow
some simple tasks solve simply

Regards

Pavel Stehule


 http://www.postgresql.org/docs/9.1/static/sql-do.html

 If you need to maintain a single connection you can do
 some interesting things with socat to feed a running psql
 in the background.

  socat -u UNIX-RECV:/tmp/msock EXEC:psql 

 Followed by lots of

  echo bar | socat -u STDIN UNIX-SENDTO:/tmp/mysock

 \o can be used to send output for pickup, although
 you do need to fuss around with the asynchronous nature
 of things to be sure you're waiting for output.
 I used inotifywait for this.  YMMV.

 Regards,

 Karl k...@meme.com
 Free Software:  You don't pay back, you pay forward.
  -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2012-11-19 Thread Dimitri Fontaine

Marko Tiikkaja pgm...@joh.to writes:
 What happens if you have a table foo and another table FoO?

 They would go to the same file.  If you think there are technical issues
 behind that decision (e.g. the dump would not restore), I would like to hear
 an example case.

I didn't try the patch itself yet so I wanted to hear that's something
you did actually try out, that's about it :) As soon as we're past the
initial agreement on this patch landing in (which I really want to see
happen), I'll devote some time on testing it.

 On the other hand, some people might find it preferrable to have them in
 different files (for example foo, foo.1, foo.2 etc).  Or some might prefer
 some other naming scheme.  One of the problems with this patch is exactly
 that people prefer different things, and providing switches for all of the
 different options people come up with would mean a lot of switches. :-(

I think this facility should provide something simple and useful, and
not something tasty. Database backups and exports are not meant to cater
with taste, they are meant to be easy to restore. In that very case, we
want to have a set of properly organized SQL files for doing partial
restores, right?

 It feels a bit icky to me too, but I didn't feel comfortable with putting in
 a lot of work to refactor the API because of how controversial this feature
 is.

+1

 pg_dump | pg_restore
 pg_export | psql

 While I agree that this idea - when implemented - would be nicer in
 practically every way, I'm not sure I want to volunteer to do all the
 necessary work.

What I think needs to happen now is a commiter's buy in that we want to
get there at some point and that your current patch is not painting us
into any corner now. So that we can accept it and have a documented path
forward.

Regards,
-- 
Dimitri Fontaine06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2012-11-19 Thread Albe Laurenz

Josh Berkus wrote:
 It would be nice for the user to have some way to know that a matview is
 empty due to never being LOADed or recently being TRUNCATEd. However, I
 don't think that relisvalid flag -- and preventing scanning the relation
 -- is a good solution. What I'd rather have instead is a timestamp of
 when the MV was last LOADed. If the MV was never loaded (or was
 truncated) that timestamp would be NULL. Such a timestamp would allow
 users to construct all kinds of ad-hoc refresh schemes for MVs which
 would not be possible without it.

+1

Kevin Grittner wrote:
 I see your point there; I'll think about that. My take was more that MVs
 would often be refreshed by crontab, and that you would want to keep
 subsequent steps from running and generating potentially plausible but
 completely inaccurate results if the LMV failed.

If one of these subsequent steps doesn't care if refresh
failed once, it shouldn't be forced to fail.  I imagine
that for many applications yesterday's data can be good enough.

Those that care should check the timestamp.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v3

2012-11-19 Thread Andres Freund

Hi Michael,


On 2012-11-19 16:28:55 +0900, Michael Paquier wrote:
 I have been able to fetch your code (thanks Andrea!) and some it. For the
 time being I am spending some time reading the code and understanding the
 whole set of features you are trying to implement inside core, even if I
 got some background from what you presented at PGCon and from the hackers
 ML.

Cool.

 Btw, as a first approach, I tried to run the logical log receiver plugged
 on a postgres server, and I am not able to make it work.

 Well, I am using settings similar to yours.
 # Run master
 rm -r ~/bin/pgsql/master/
 initdb -D ~/bin/pgsql/master/
 echo local replication $USER trust  ~/bin/pgsql/master/pg_hba.conf
 postgres -D ~/bin/pgsql/master \
   -c wal_level=logical \
   -c max_wal_senders=10 \
   -c max_logical_slots=10 \
   -c wal_keep_segments=100 \
   -c log_line_prefix=[%p %x] 
 # Logical log receiver
 pg_receivellog -f $HOME/output.txt -d postgres -v

 After launching some SQLs, the logical receiver is stuck just after sending
 INIT_LOGICAL_REPLICATION, please see bt of process waiting:

Its waiting till it sees initial an initial xl_running_xacts record. The
easiest way to do that is manually issue a checkpoint. Sorry, should
have included that in the description.
Otherwise you can wait till the next routine checkpoint comes arround...

I plan to cause more xl_running_xacts record to be logged in the
future. I think the timing of those currently is non-optimal, you have
the same problem as here in normal streaming replication as well :(

  -- wrapped in a transaction
  BEGIN;
  INSERT INTO replication_example(somedata, text) VALUES (1, 1);
  UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT
  currval('replication_example_id_seq'));
 
 In SET clause, the column name is *somedata* and not *somedate*

Crap. Sorry for that. I wrote the example in the mailclient and then
executed it and I seem to have forgot to put back some of the fixes...


 I am just looking at this patch and will provide some comments.
 By the way, you forgot the installation part of pg_receivellog, please see
 patch attached.

That actually was somewhat intended, I thought people wouldn't like the
name and I didn't want a binary thats going to be replaced anyway lying
arround ;)

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_trgm partial-match

2012-11-19 Thread Alexander Korotkov

On Mon, Nov 19, 2012 at 10:05 AM, Alexander Korotkov
aekorot...@gmail.comwrote:

 On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao masao.fu...@gmail.comwrote:

 Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
 i.e., if query key contains multibyte characters. In this case, byte
 length of
 the trigram string might be larger than three, and its CRC is used as a
 trigram key instead of the trigram string itself. Because of using CRC, we
 cannot do a partial-match. Attached patch extends pg_trgm so that it
 compares a partial-match query key only when KEEPONLYALNUM is
 enabled.


 Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram
 character is singlebyte?

 CREATE TABLE test (val TEXT);
 INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш');
 CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
 ANALYZE test;
 test=# SELECT * FROM test WHERE val LIKE '%aa%';
  val
 --
  aa
   aaa
  шaaш
 (3 rows)
 test=# set enable_seqscan = off;
 SET
 test=# SELECT * FROM test WHERE val LIKE '%aa%';
  val
 -
  aa
  aaa
 (2 rows)

 I think we can use partial match only for singlebyte encodings. Or, at
 most, in cases when all alpha-numeric characters are singlebyte (have no
 idea how to check this).


Actually, I also was fiddling around idea of partial match on trigrams when
I was working on initial LIKE patch. But, I concluded that we would need a
separate opclass which always keeps full trigram in entry.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] too much pgbench init output

2012-11-19 Thread Jeevan Chalke

Hi,

I gone through the discussion for this patch and here is my review:

The main aim of this patch is to reduce the number of log lines. It is also
suggested to use an options to provide the interval but few of us are not
much agree on it. So final discussion ended at keeping 5 sec interval
between each log line.

However, I see, there are two types of users here:
1. Who likes these log lines, so that they can troubleshoot some slowness
and all
2. Who do not like these log lines.

So keeping these in mind, I rather go for an option which will control
this. People falling in category one can set this option to very low where
as users falling under second category can keep it high.

However, assuming we settled on 5 sec delay, here are few comments on that
patch attached:

Comments:
=

Patch gets applied cleanly with some whitespace errors. make and make
install too went smooth.
make check was smooth. Rather it should be smooth since there are NO
changes in other part of the code rather than just pgbench.c and we do not
have any test-case as well.

However, here are few comments on changes in pgbench.c

1.
Since the final discussion ended at keeping a 5 seconds interval will be
good enough, Author used a global int variable for that.
Given that it's just a constant, #define would be a better choice.

2.
+/* let's not call the timing for each row, but only each 100 rows
*/
Why only 100 rows ? Have you done any testing to come up with number 100 ?
To me it seems very low. It will be good to test with 1K or even 10K.
On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in VM
with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
checking every 100 rows looks overkill.

3.
Please indent following block as per the indentation just above that

/* used to track elapsed time and estimate of the remaining time */
instr_timestart, diff;
double elapsed_sec, remaining_sec;
int log_interval = 1;

4.
+/* have ve reached the next interval? */
Do you mean have WE reached...

5.
While applying a patch, I got few white-space errors. But I think every
patch goes through pgindent which might take care of this.

Thanks



On Sun, Nov 11, 2012 at 11:02 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 23.10.2012 18:21, Robert Haas wrote:
  On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera
  alvhe...@2ndquadrant.com wrote:
  Tomas Vondra wrote:
 
  I've been thinking about this a bit more, and do propose to use an
  option that determines logging step i.e. number of items (either
  directly or as a percentage) between log lines.
 
  The attached patch defines a new option --logging-step that accepts
  either integers or percents. For example if you want to print a line
  each 1000 lines, you can to this
 
$ pgbench -i -s 1000 --logging-step 1000 testdb
 
  I find it hard to get excited about having to specify a command line
  argument to tweak this.  Would it work to have it emit messages
  depending on elapsed time and log scale of tuples emitted?  So for
  example emit the first message after 5 seconds or 100k tuples, then back
  off until (say) 15 seconds have lapsed and 1M tuples, etc?  The idea is
  to make it verbose enough to keep a human satisfied with what he sees,
  but not flood the terminal with pointless updates.  (I think printing
  the ETA might be nice as well, not sure).
 
  I like this idea.  One of the times when the more verbose output is
  really useful is when you expect it to run fast but then it turns out
  that for some reason it runs really slow.  If you make the output too
  terse, then you end up not really knowing what's going on.  Having it
  give an update at least every 5 seconds would be a nice way to give
  the user a heads-up if things aren't going as planned, without
  cluttering the normal case.

 I've prepared a patch along these lines. The attached version used only
 elapsed time to print the log messages each 5 seconds, so now it prints
 a meessage each 5 seconds no matter what, along with an estimate of
 remaining time.

 I've removed the config option, although it might be useful to specify
 the interval?

 I'm not entirely sure how the 'log scale of tuples' should work - for
 example when the time 15 seconds limit is reached, should it be reset
 back to the previous step (5 seconds) to give a more detailed info, or
 should it be kept at 15 seconds?

 Tomas



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Jeevan B Chalke
Senior Software Engineer, RD
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This

Re: [HACKERS] foreign key locks

2012-11-19 Thread Andres Freund


On 2012-11-14 13:27:26 -0300, Alvaro Herrera wrote:
 Alvaro Herrera wrote:

   * In heap_lock_tuple's  XMAX_IS_MULTI case
  
   [snip]
  
   why is it membermode  mode and not membermode = mode?
 
  Uh, that's a bug.  Fixed.  As noticed in the comment above that snippet,
  there was a deadlock possible here.  Maybe I should add a test to ensure
  this doesn't happen.

 Done:
 https://github.com/alvherre/postgres/commit/df2847e38198e99f57e52490e1e9391ebb70d770

Some more review bits, based on ffd6250d1d393f2ecb9bfc55c2c6f715dcece557

- if oldestMultiXactId + db is set and then that database is dropped we seem to
  have a problem because MultiXactAdvanceOldest won't overwrite those
  values. Should probably use SetMultiXactIdLimit directly.

- what stop multixacts only being filled out (i.e RecordNewMultiXact()) *after*
  the XLogInsert() *and* after a MultiXactGetCheckptMulti()? Afaics
  MultiXactGenLock is not hold in CreateMultiXactId(). If we crash in that
  moment we loose the multixact data which now means potential data loss...

- multixact member group data crossing 512 sector boundaries makes me uneasy
  (as its 5 bytes). I don't really see a scenario where its dangerous, but
  ... Does anybody see a problem here?

- there are quite some places that do
multiStopLimit = multiWrapLimit - 100;
if (multiStopLimit  FirstMultiXactId)
multiStopLimit -= FirstMultiXactId;

  perhaps MultiXactIdAdvance and MultiXactIdRetreat macros are in order?

- I find using a default: clause in switches with enum types where everything
  is expected to be handled like the following a bad idea, this way the
  compiler won't warn you if youve missed case's which makes changing/extending 
code harder:
switch (rc-strength)
{
case LCS_FORNOKEYUPDATE:
newrc-markType = ROW_MARK_EXCLUSIVE;
break;
case LCS_FORSHARE:
newrc-markType = ROW_MARK_SHARE;
break;
case LCS_FORKEYSHARE:
newrc-markType = ROW_MARK_KEYSHARE;
break;
case LCS_FORUPDATE:
newrc-markType = ROW_MARK_KEYEXCLUSIVE;
break;
default:
elog(ERROR, unsupported rowmark type %d, 
rc-strength);
}
-
#if 0
/*
 * The idea here is to remove the IS_MULTI bit, and 
replace the
 * xmax with the updater's Xid.  However, we can't 
really do it:
 * modifying the Xmax is not allowed under our buffer 
locking
 * rules, unless we have an exclusive lock; but we 
don't know that
 * we have it.  So the multi needs to remain in place 
:-(
 */
ResetMultiHintBit(tuple, buffer, xmax, true);
#endif

 Three things:
- HeapTupleSatisfiesUpdate is actually always called exclusively locked ;)
- Extending something like LWLockHeldByMe to also return the current
  lockmode doesn't sound hard
- we seem to be using #ifdef NOT_YET for such cases

- Using a separate production for the lockmode seems to be nicer imo, not
  really important though
for_locking_item:
FOR UPDATE locked_rels_list opt_nowait
...
| FOR NO KEY UPDATE locked_rels_list opt_nowait
...
| FOR SHARE locked_rels_list opt_nowait
...
| FOR KEY SHARE locked_rels_list opt_nowait
;

- not really padding, MultiXactStatus is 4bytes...
/*
 * XXX Note: there's a lot of padding space in MultiXactMember.  We 
could
 * find a more compact representation of this Xlog record -- perhaps 
all the
 * status flags in one XLogRecData, then all the xids in another one?  
Not
 * clear that it's worth the trouble though.
 */
- why
#define SizeOfMultiXactCreate (offsetof(xl_multixact_create, nmembers) + 
sizeof(int32))
and not
#define SizeOfMultiXactCreate offsetof(xl_multixact_create, members)
- starting a critical section in GetNewMultiXactId but not ending it is ugly,
  but not new

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v3 - Source for Slony

2012-11-19 Thread Steve Singer


First, you can add me to the list of people saying 'wow', I'm impressed.

The approach I am taking to reviewing this to try and answer the 
following question


1) How might a future version of slony be able to use logical 
replication as described by your patch and design documents

and what would that look like.

2) What functionality is missing from the patch set that would stop me 
from implementing or prototyping the above.




Connecting slon to the remote postgresql


Today the slony remote listener thread queries a bunch of events from 
sl_event for a batch of SYNC events. Then the remote helper thread 
queries data from sl_log_1 and sl_log_2.I see this changing, instead 
the slony remote listener thread would connect to the remote system and 
get a logical replication stream.


  1) Would slony connect as a normal client connection and call 
something like 'select pg_slony_process_xlog(...)' to get bunch of 
logical replication

  change records to process.
  OR
  2) Would slony connect as a replication connection similar to how the 
pg_receivelog program does today and then process the logical changeset

  outputs.  Instead of writing it to a file (as pg_receivelog does)

It seems that the second approach is what is encouraged.  I think we 
would put a lot of the pg_receivelog functionality into slon and it 
would issue a command like 'INIT_LOGICAL_REPLICATION 'slony') to use the 
slony logical replication plugin.  Slon would also have to provide 
feedback to the walsender about what it has processed so the origin 
database knows what catalog snapshots can be expired.  Based on 
eyeballing pg_receivelog.c it seems that about half the code in the 700 
file is related to command line arguments etc, and the other half is 
related to looping over the copy out stream, sending feedback and other 
things that we would need to duplicate in slon.


pg_receivelog.c has  comment:

/*
 * We have to use postgres.h not postgres_fe.h here, because there's so 
much

 * backend-only stuff in the XLOG include files we need.  But we need a
 * frontend-ish environment otherwise.Hence this ugly hack.
 */

This looks like more of a carryover from pg_receivexlog.c.  From what I 
can tell we can eliminate the postgres.h include if we also eliminate 
the utils/datetime.h and utils/timestamp.h and instead add in:


#include postgres_fe.h
#define POSTGRES_EPOCH_JDATE 2451545
#define UNIX_EPOCH_JDATE 2440588
#define SECS_PER_DAY 86400
#define USECS_PER_SEC INT64CONST(100)
typedef int64 XLogRecPtr;
#define InvalidXLogRecPtr 0

If there is a better way of getting these defines someone should speak 
up.   I recall that in the past slon actually did include postgres.h and 
it caused some issues (I think with MSVC win32 builds).  Since 
pg_receivelog.c will be used as a starting point/sample for third 
parties to write client programs it would be better if it didn't 
encourage client programs to include postgres.h



The Slony Output Plugin
=

Once we've modified slon to connect as a logical replication client we 
will need to write a slony plugin.


As I understand the plugin API:
* A walsender is processing through WAL records, each time it sees a 
COMMIT WAL record it will call my plugins

.begin
.change (for each change in the transaction)
.commit

* The plugin for a particular stream/replication client will see one 
transaction at a time passed to it in commit order.  It won't see 
.change(t1) followed by .change (t2), followed by a second .change(t1).  
The reorder buffer code hides me from all that complexity (yah)


From a slony point of view I think the output of the plugin will be 
rows, suitable to be passed to COPY IN of the form:


origin_id, table_namespace,table_name,command_type, 
cmd_updatencols,command_args


This is basically the Slony 2.2 sl_log format minus a few columns we no 
longer need (txid, actionseq).
command_args is a postgresql text array of column=value pairs.  Ie [ 
{id=1},{name='steve'},{project='slony'}]


I don't t think our output plugin will be much more complicated than the 
test_decoding plugin.  I suspect we will want to give it the ability to 
filter out non-replicated tables.   We will also have to filter out 
change records that didn't originate on the local-node that aren't part 
of a cascaded subscription.  Remember that in a two node cluster  slony 
will have connections from A--B  and from B---A even if user tables 
only flow one way. Data that is replicated from A into B will show up in 
the WAL stream for B.


Exactly how we do this filtering is an open question,  I think the 
output plugin will at a minimum need to know:


a) What the slony node id is of the node it is running on.  This is easy 
to figure out if the output plugin is able/allowed to query its 
database.  Will this be possible? I would expect to be able to query the 
database as it exists now(at plugin invocation time) not as it existed 
in the past

Re: [HACKERS] logical changeset generation v3 - Source for Slony

2012-11-19 Thread Andres Freund

Hi Steve!

On 2012-11-17 22:50:35 -0500, Steve Singer wrote:
 First, you can add me to the list of people saying 'wow', I'm impressed.

Thanks!

 The approach I am taking to reviewing this to try and answer the following
 question

 1) How might a future version of slony be able to use logical replication as
 described by your patch and design documents
 and what would that look like.

 2) What functionality is missing from the patch set that would stop me from
 implementing or prototyping the above.

Sounds like a good plan to me.


 Connecting slon to the remote postgresql
 

 Today the slony remote listener thread queries a bunch of events from
 sl_event for a batch of SYNC events. Then the remote helper thread queries
 data from sl_log_1 and sl_log_2.I see this changing, instead the slony
 remote listener thread would connect to the remote system and get a logical
 replication stream.

   1) Would slony connect as a normal client connection and call something
 like 'select pg_slony_process_xlog(...)' to get bunch of logical replication
   change records to process.
   OR
   2) Would slony connect as a replication connection similar to how the
 pg_receivelog program does today and then process the logical changeset
   outputs.  Instead of writing it to a file (as pg_receivelog does)

It would need to be the latter. We need the feedback messages it sends
for several purposes:
- increasing the lowered xmin
- implementing optionally synchronous replication at some point
- using 1) would mean having transactions open...

 It seems that the second approach is what is encouraged.  I think we would
 put a lot of the pg_receivelog functionality into slon and it would issue a
 command like 'INIT_LOGICAL_REPLICATION 'slony') to use the slony logical
 replication plugin.  Slon would also have to provide feedback to the
 walsender about what it has processed so the origin database knows what
 catalog snapshots can be expired.  Based on eyeballing pg_receivelog.c it
 seems that about half the code in the 700 file is related to command line
 arguments etc, and the other half is related to looping over the copy out
 stream, sending feedback and other things that we would need to duplicate in
 slon.

I think we should provide some glue code to do this, otherwise people
will start replicating all the bugs I hacked into this... More
seriously: I think we should have support code here, no user will want
to learn the intracacies of feedback messages and such. Where that would
live? No idea.

 pg_receivelog.c has  comment:

(its pg_receivellog btw. ;))


 /*
  * We have to use postgres.h not postgres_fe.h here, because there's so much
  * backend-only stuff in the XLOG include files we need.  But we need a
  * frontend-ish environment otherwise.Hence this ugly hack.
  */

 This looks like more of a carryover from pg_receivexlog.c.  From what I can
 tell we can eliminate the postgres.h include if we also eliminate the
 utils/datetime.h and utils/timestamp.h and instead add in:

 #include postgres_fe.h
 #define POSTGRES_EPOCH_JDATE 2451545
 #define UNIX_EPOCH_JDATE 2440588
 #define SECS_PER_DAY 86400
 #define USECS_PER_SEC INT64CONST(100)
 typedef int64 XLogRecPtr;
 #define InvalidXLogRecPtr 0

 If there is a better way of getting these defines someone should speak up.
 I recall that in the past slon actually did include postgres.h and it caused
 some issues (I think with MSVC win32 builds).  Since pg_receivelog.c will be
 used as a starting point/sample for third parties to write client programs
 it would be better if it didn't encourage client programs to include
 postgres.h

I wholeheartedly aggree. It should also be cleaned up a fair bit before
others copy it should we not go for having some client side library.

Imo the library could very roughly be something like:

state = SetupStreamingLLog(replication-slot, ...);
while((message = StreamingLLogNextMessage(state))
{
 write(outfd, message-data, message-length);
 if (received_100_messages)
 {
  fsync(outfd);
  StreamingLLogConfirm(message);
 }
}

Although I guess thats not good enough because StreamingLLogNextMessage
would be blocking, but that shouldn't be too hard to work around.


 The Slony Output Plugin
 =

 Once we've modified slon to connect as a logical replication client we will
 need to write a slony plugin.

 As I understand the plugin API:
 * A walsender is processing through WAL records, each time it sees a COMMIT
 WAL record it will call my plugins
 .begin
 .change (for each change in the transaction)
 .commit

 * The plugin for a particular stream/replication client will see one
 transaction at a time passed to it in commit order.  It won't see
 .change(t1) followed by .change (t2), followed by a second .change(t1).  The
 reorder buffer code hides me from all that complexity (yah)

Correct.

 From a slony point of view I think the output of the plugin will be

Re: [HACKERS] foreign key locks

2012-11-19 Thread Andres Freund

On 2012-11-14 13:27:26 -0300, Alvaro Herrera wrote:
 Alvaro Herrera wrote:

   * In heap_lock_tuple's  XMAX_IS_MULTI case
  
   [snip]
  
   why is it membermode  mode and not membermode = mode?
 
  Uh, that's a bug.  Fixed.  As noticed in the comment above that snippet,
  there was a deadlock possible here.  Maybe I should add a test to ensure
  this doesn't happen.

 Done:
 https://github.com/alvherre/postgres/commit/df2847e38198e99f57e52490e1e9391ebb70d770

 (I don't think this is worth a v24 submission).

One more observation:
/*
 * Get and lock the updated version of the row; if fail, return
NULL.
 */
-   copyTuple = EvalPlanQualFetch(estate, relation, LockTupleExclusive,
+   copyTuple = EvalPlanQualFetch(estate, relation, LockTupleNoKeyExclusive,

That doesn't seem to be correct to me. Why is it ok to acquire a
potentially too low locklevel here?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] review: Reduce palloc's in numeric operations

2012-11-19 Thread Pavel Stehule

Hello all

I tested this patch and I can confirm, so this patch can increase
speed about 16-22% (depends on IO waits, load type).

I tested real query (anonymyzed)

SELECT SUM(COALESCE((a * b * c * (d + ((1 -(d)) * (1 -(e),0)) m1 FROM  tab;

-- patched
4493 26.5591  postgres slot_deform_tuple
1327  7.8442  postgres AllocSetAlloc
1313  7.7614  postgres ExecMakeFunctionResultNoSets
1105  6.5319  postgres set_var_from_num_nocopy
924   5.4620  postgres make_result
637   3.7654  postgres mul_var
635   3.7536  postgres numeric_mul
560   3.3103  postgres MemoryContextAlloc
405   2.3940  postgres AllocSetFree
389   2.2995  postgres ExecEvalScalarVarFast
332   1.9625  postgres slot_getsomeattrs
322   1.9034  postgres numeric_add
313   1.8502  postgres add_abs
304   1.7970  postgres pfree
238   1.4069  postgres slot_getattr
216   1.2768  postgres numeric_sub
200   1.1822  postgres heap_tuple_untoast_attr
183   1.0818  postgres strip_var
180   1.0640  postgres ExecEvalConst
173   1.0226  postgres alloc_var
172   1.0167  postgres check_stack_depth

-- origin
4419 22.8325  postgres slot_deform_tuple
2041 10.5456  postgres AllocSetAlloc
1248  6.4483  postgres set_var_from_num
1186  6.1279  postgres ExecMakeFunctionResultNoSets
886   4.5779  postgres MemoryContextAlloc
856   4.4229  postgres make_result
757   3.9113  postgres numeric_mul
731   3.7770  postgres AllocSetFree
625   3.2293  postgres mul_var
601   3.1053  postgres alloc_var
545   2.8160  postgres pfree
503   2.5989  postgres free_var
458   2.3664  postgres slot_getsomeattrs
378   1.9531  postgres numeric_add
360   1.8601  postgres add_abs
336   1.7361  postgres ExecEvalScalarVarFast
266   1.3744  postgres slot_getattr
221   1.1419  postgres numeric_sub

Review:

1) this patch was clearly applied and compilation was without warning

2) regress tests -  All 133 tests passed.

4) don't see any memory leaks there (verified by following code)

CREATE OR REPLACE FUNCTION public.fx(_m integer)
 RETURNS void
 LANGUAGE plpgsql
AS $function$
declare m numeric = 10;
n numeric = 20022.222;
  r numeric;
begin
  for i in 1.._m
  loop
r := m * n;
  end loop;
end;
$function$


postgres=# select fx(1000);
 fx


(1 row)

Time: 5312.623 ms  --- original ( 4798.103 ms -- patched  10% speedup)

5) it respect PostgreSQL's coding standards

6) we would to accept this patch - it can carry 10% speedup
calculations with numerics.

Regards

Pavel Stehule


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2012-11-19 Thread Dimitri Fontaine

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 pg_dump | pg_restore
 pg_export | psql

 While I agree that this idea - when implemented - would be nicer in
 practically every way, I'm not sure I want to volunteer to do all the
 necessary work.

 What I think needs to happen now is a commiter's buy in that we want to
 get there at some point and that your current patch is not painting us
 into any corner now. So that we can accept it and have a documented path
 forward.

Just stumbled accross this message while reading some older threads
about the current topic:

  http://archives.postgresql.org/pgsql-hackers/2010-12/msg02496.php

Where Robert Treat said:
 I've both enjoyed reading this thread and seeing this wheel reinvented
 yet again, and wholeheartedly +1 the idea of building this directly
 into pg_dump. (The only thing better would be to make everything thing
 sql callable, but that's a problem for another day).

I know Andrew has been working on his Retail DDL project which is
basically a bunch of server-side functions that spits out SQL object
definitions. Andrew, were you able to make progress on that project?

On the other hand, pg_dump -Fs still is something I would like to have
as a complement to Andrew's facility.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-19 Thread Dimitri Fontaine

Amit Kapila amit.kap...@huawei.com writes:
 We have discussion about below 3 different syntaxes for this command
  
 1. ALTER SYSTEM
 2. SET PERSISENT
 3. pg_change_conf()

 I think to conclude, we need to evaluate which syntax has more advantages.
 Comparison for above syntax

I think ALTER SYSTEM should be what Peter Eisentraut proposed in another
thread, using system catalogs and thus not supporting the whole range of
parameters and reset behavior on SIGHUP. That's still very useful, and
seems to me clear enough to document.

Then, I think you could implement a SET PERSISENT command that call the
pg_change_conf() fonction. The problem is that you then can't have the
command unavailable in a transaction block if all it does is calling the
function, because the function call needs to happen in a transaction.

I'd vote for having a lock that serialize any calls to that function. My
understanding of the use cases makes it really ok not be to accept any
concurrency behavior here.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doc patch, put RAISE USING keywords into a table

2012-11-19 Thread Karl O. Pinc

On 11/17/2012 12:16:06 AM, Peter Eisentraut wrote:

 I'm unsure whether splitting this out into a list (or table) is an
 improvement.  Other opinions?
 
 This page is written as a narrative explanation of the RAISE feature,
 but there is probably a desire to also have it serve as a reference
 page
 in the style of the SQL command reference pages.

Yes.  I do want to be able to scan the page quickly
to pick out the keywords and their use.  There is no reference page.

FWIW

There are lists and tables in narrative parts of the documentation.
The very next section, (PL/pgSQL) Trigger Procedures has a variablelist
of special variables and The SQL Language top-level chapter, which
claims to be narrative, has many tables.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-19 Thread Amit Kapila

On Monday, November 19, 2012 7:53 PM Dimitri Fontaine wrote:
 Amit Kapila amit.kap...@huawei.com writes:
  We have discussion about below 3 different syntaxes for this command
 
  1. ALTER SYSTEM
  2. SET PERSISENT
  3. pg_change_conf()
 
  I think to conclude, we need to evaluate which syntax has more
 advantages.
  Comparison for above syntax
 
 I think ALTER SYSTEM should be what Peter Eisentraut proposed in another
 thread, using system catalogs and thus not supporting the whole range of
 parameters and reset behavior on SIGHUP. That's still very useful, and
 seems to me clear enough to document.

 Then, I think you could implement a SET PERSISENT command that call the
 pg_change_conf() fonction. The problem is that you then can't have the
 command unavailable in a transaction block if all it does is calling the
 function, because the function call needs to happen in a transaction.

If we want to go for SET PERSISTENT, then no need of built-in function call
pg_change_conf(), 
the functionality can be implemented in some internal function.
I believe still avoiding start of transaction is un-avoidable, as even parse
of statement is called
after StartTransaction.
The only point I can see against SET PERSISTENT is that other variants of
SET command can be used in
transaction blocks means for them ROLLBACK TO SAVEPOINT functionality works,
but for SET PERSISTENT,
it can't be done. 
So to handle that might be we need to mention this point in User Manual, so
that users can be aware of this usage.
If that is okay, then I think SET PERSISTENT is good to go.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER command reworks

2012-11-19 Thread Dimitri Fontaine

Hi,

Thanks for working on that cleanup job!

Kohei KaiGai kai...@kaigai.gr.jp writes:
 The attached patch is the revised version of ALTER RENAME TO
 consolidation. According to the previous suggestion, it uses
 a common logic to check object-naming duplication at
 check_duplicate_objectname().

I think you need to better comment which object types are supported by
the new generic function and which are not in AlterObjectRename().

 At the code path on AlterObjectNamespace_internal, it lost
 ObjectType information to the supplied object, so I also added
 get_object_type() function at objectaddress.c for inverse
 translation from a couple of classId/objectId to OBJECT_* label.

I don't understand that part, and it looks like it would be way better
to find a way to pass down the information you already have earlier in
the code than to compute it again doing syscache lookups…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-19 Thread Alvaro Herrera

Amit Kapila escribió:

 The only point I can see against SET PERSISTENT is that other variants of
 SET command can be used in
 transaction blocks means for them ROLLBACK TO SAVEPOINT functionality works,
 but for SET PERSISTENT,
 it can't be done. 
 So to handle that might be we need to mention this point in User Manual, so
 that users can be aware of this usage.
 If that is okay, then I think SET PERSISTENT is good to go.

I think that's okay.  There are other commands which have some forms
that can run inside a transaction block and others not.  CLUSTER is
one example (maybe the only one?  Not sure).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2012-11-19 Thread Andrew Dunstan



On 11/19/2012 09:07 AM, Dimitri Fontaine wrote:

Dimitri Fontaine dimi...@2ndquadrant.fr writes:

 pg_dump | pg_restore
 pg_export | psql

While I agree that this idea - when implemented - would be nicer in
practically every way, I'm not sure I want to volunteer to do all the
necessary work.

What I think needs to happen now is a commiter's buy in that we want to
get there at some point and that your current patch is not painting us
into any corner now. So that we can accept it and have a documented path
forward.

Just stumbled accross this message while reading some older threads
about the current topic:

   http://archives.postgresql.org/pgsql-hackers/2010-12/msg02496.php

Where Robert Treat said:

I've both enjoyed reading this thread and seeing this wheel reinvented
yet again, and wholeheartedly +1 the idea of building this directly
into pg_dump. (The only thing better would be to make everything thing
sql callable, but that's a problem for another day).

I know Andrew has been working on his Retail DDL project which is
basically a bunch of server-side functions that spits out SQL object
definitions. Andrew, were you able to make progress on that project?

On the other hand, pg_dump -Fs still is something I would like to have
as a complement to Andrew's facility.




No, sorry, it's on hold - I'm finding trouble finding time to work on it.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gset updated patch

2012-11-19 Thread Karl O. Pinc

On 11/19/2012 02:30:49 AM, Pavel Stehule wrote:
 Hello
 
 2012/11/16 Karl O. Pinc k...@meme.com:
  Hi Pavel,
 
  On 11/16/2012 12:21:11 AM, Pavel Stehule wrote:
  2012/11/16 Karl O. Pinc k...@meme.com:
 
   As long as I'm talking crazy talk, why not
   abandon psql as a shell language and instead make a
   pl/pgsql interpreter with readlne() in front
   of it?   Solve all these language-related
   issues by using an actual programming language.  :-)
 
  I though about it more time, but I don't thinking so this has a
  sense.


  You might consider using do.
 
 it is reason, why I don't thinking about plpgsql on client side. But 
 I
 don't understand how it is related to gset ?

Because the plpgsql SELECT INTO sets variables from query results,
exactly what \gset does.  You have to use EXECUTE
in plpgsql to do the substitution into statements, but that's
syntax.

 
 I remember, there is one significant limit of DO statement - it 
 cannot
 return table - so it cannot substitute psql simple scripts.

Yes. I'm wrong.  For some reason I thought you could use DO to make
an anonymous code block that would act as a SETOF function,
allowing RETURN NEXT expr (et-al) to be used in the
plpgsql code, allowing DO to return table results.
(Or, perhaps, instead, be used in place of a table in a SELECT
statement.)  Oh well.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER command reworks

2012-11-19 Thread Kohei KaiGai

Hi Dimitri,

Thanks for your checks.

2012/11/19 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Kohei KaiGai kai...@kaigai.gr.jp writes:
 The attached patch is the revised version of ALTER RENAME TO
 consolidation. According to the previous suggestion, it uses
 a common logic to check object-naming duplication at
 check_duplicate_objectname().

 I think you need to better comment which object types are supported by
 the new generic function and which are not in AlterObjectRename().

OK, Are you suggesting to add a generic comments such as Generic
function to change the name of a given object, for simple cases ...,
not a list of OBJECT_* at the head of this function, aren't you?

 At the code path on AlterObjectNamespace_internal, it lost
 ObjectType information to the supplied object, so I also added
 get_object_type() function at objectaddress.c for inverse
 translation from a couple of classId/objectId to OBJECT_* label.

 I don't understand that part, and it looks like it would be way better
 to find a way to pass down the information you already have earlier in
 the code than to compute it again doing syscache lookups…

The pain point is AlterObjectNamespace_internal is not invoked by
only ExecAlterObjectSchemaStmt(), but AlterObjectNamespace_oid()
also.
It is the reason why I had to put get_object_type() to find out OBJECT_*
from the supplied ObjectAddress, because this code path does not
available to pass down its ObjectType from the caller.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gset updated patch

2012-11-19 Thread Pavel Stehule

2012/11/19 Karl O. Pinc k...@meme.com:
 On 11/19/2012 02:30:49 AM, Pavel Stehule wrote:
 Hello

 2012/11/16 Karl O. Pinc k...@meme.com:
  Hi Pavel,
 
  On 11/16/2012 12:21:11 AM, Pavel Stehule wrote:
  2012/11/16 Karl O. Pinc k...@meme.com:
 
   As long as I'm talking crazy talk, why not
   abandon psql as a shell language and instead make a
   pl/pgsql interpreter with readlne() in front
   of it?   Solve all these language-related
   issues by using an actual programming language.  :-)
 
  I though about it more time, but I don't thinking so this has a
  sense.


  You might consider using do.

 it is reason, why I don't thinking about plpgsql on client side. But
 I
 don't understand how it is related to gset ?

 Because the plpgsql SELECT INTO sets variables from query results,
 exactly what \gset does.  You have to use EXECUTE
 in plpgsql to do the substitution into statements, but that's
 syntax.

yes, but I cannot set a psql variable

but a original proposal for gset was \exec  into var :)

although \gset is more simply and there are no problem with multiline queries



 I remember, there is one significant limit of DO statement - it
 cannot
 return table - so it cannot substitute psql simple scripts.

 Yes. I'm wrong.  For some reason I thought you could use DO to make
 an anonymous code block that would act as a SETOF function,
 allowing RETURN NEXT expr (et-al) to be used in the
 plpgsql code, allowing DO to return table results.
 (Or, perhaps, instead, be used in place of a table in a SELECT
 statement.)  Oh well.

yes, I understand - batch model for DO is more natural, than
function - but it is not true :(.

I hope, so one time we will have a real stored procedures with unbound
queries. Then we can redefine DO behave, because it doesn't break
compatibility. But it is long way - maybe 9.5

Regards

Pavel


 Regards,

 Karl k...@meme.com
 Free Software:  You don't pay back, you pay forward.
  -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] writable foreign tables

2012-11-19 Thread Albe Laurenz

Kohei KaiGai wrote:
 I am not so happy with GetForeignRelInfo:
 - The name seems ill-chosen from the FDW API side.
   I guess that you chose the name because the function
   is called from get_relation_info, but I think the name
   should be more descriptive for the FDW API.
   Something like PlanForeignRelRowid.

 Indeed, GetForeignRelInfo might give misleading impression
 as if this routine collects widespread information about
 target relation. So, how about GetForeignRelWidth() instead?

That would be better for the function as it is now.

 - I guess that every FDW that only needs rowid will
   do exactly the same as your fileGetForeignRelInfo.
   Why can't that be done in core?
   The function could pass an AttrNumber for the rowid
   to the FDW, and will receive a boolean return code
   depending on whether the FDW plans to use rowid or not.
   That would be more convenient for FDW authors.

 This design tries to kill two-birds with one-stone.
 It enables to add multiple number of pseudo-columns,
 not only rowid, and makes possible to push-down
 complex calculation of target list into external computing
 resource.
 
 For example, when user gives the following query:
 
   SELECT ((c1 - c2) * (c2 - c3))^2 FROM ftable
 
 it contains a complex calculation in the target-list,
 thus, it also takes CPU cycles of local process.
 
 If we can replace the ((c1 - c2) * (c2 - c3))^2 by
 a reference to a pseudo-column that also references
 the calculation result on external node, it effectively
 off-load CPU cycles.
 
 In this case, all we need to do is (1) acquire a slot
 for pseudo-column at GetForeignRelInfo (2) replace
 TargetEntry::expr by Var node that reference this
 pseudo-column.
 
 It makes sense for performance optimization, so I don't
 want to restrict this handler for rowid only.

I understand.

But I think that you still can do that with the change that
I suggest.  I suggest that GetForeignRelInfo (or whatever the
name ends up being) gets the AttrNumber of the proposed rowid
column in addition to the parameters you need for what
you want to do.

Then nothing would keep you from defining those
pseudo-columns.  But all the setup necessary for the rowid
column could be moved out of the FDW.  So for the 99% of all
FDW which are only interested in the rowid, things would
get much simpler and they don't all have to implement the
same code.

Did I make clear what I mean?
Would that be difficult?

 - I guess the order is dictated by planner steps, but
   it would be nice to have if GetForeignRelInfo were
   not the first function to be called during planning.
   That would make it easier for a FDW to support both
   9.2 and 9.3 (fewer #ifdefs), because the FDW plan state
   will probably have to be created in the first API
   function.

 The baserel-fdw_private should be initialized to NULL,
 so it can perform as a mark whether private data is already
 constructed, or not.

Right, if that pointer is pre-initialized to NULL, that
should work.  Forget my quibble.

 In addition, I noticed my patch didn't update documentation stuff.
 I also add mention about new handlers.

I didn't get into documentation, comment and spelling issues since
the patch was still called POC, but yes, eventually that would
be necessary.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-19 Thread Amit kapila

On Monday, November 19, 2012 6:05 AM Jeff Janes  wrote:
On Mon, Oct 22, 2012 at 10:51 AM, Amit kapila amit.kap...@huawei.com wrote:


 Today again I have again collected the data for configuration Shared_buffers 
 = 7G along with vmstat.
 The data and vmstat information (bi) are attached with this mail. It is 
 observed from vmstat info that I/O is happening for both cases, however 
 after running for
 long time, the I/O is also comparatively less with new patch.

What I see in the vmstat report is that it takes 5.5 runs to get
really good and warmed up, and so it crawls for the first 5.5
benchmarks and then flies for the last 0.5 benchmark.  The way you
have your runs ordered, that last 0.5 of a benchmark is for the
patched version, and this drives up the average tps for the patched
case.


 Also, there is no theoretical reason to think that your patch would
 decrease the amount of IO needed (in fact, by invalidating buffers
 early, it could be expected to increase the amount of IO).  So this
 also argues that the increase in performance is caused by the decrease
 in IO, but the patch isn't causing that decrease, it merely benefits
 from it due to an accident of timing.

Today, I have ran in the opposite order, still I see for some readings the 
similar observation.
I am also not sure of IO part, just based on data I was trying to interpret 
that way. However
may be for some particular scenario, due to OS buffer management it behaves 
that way.
As I am not aware of OS buffer management algorithm, so it's difficult to say 
that such a change would have any impact on OS buffer management
which can yield better performance.

With Regards,
Amit Kapila.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-19 Thread Amit Kapila

On Monday, November 19, 2012 8:36 PM Alvaro Herrera wrote:
 Amit Kapila escribió:
 
  The only point I can see against SET PERSISTENT is that other variants
 of
  SET command can be used in
  transaction blocks means for them ROLLBACK TO SAVEPOINT functionality
 works,
  but for SET PERSISTENT,
  it can't be done.
  So to handle that might be we need to mention this point in User
 Manual, so
  that users can be aware of this usage.
  If that is okay, then I think SET PERSISTENT is good to go.
 
 I think that's okay.  There are other commands which have some forms
 that can run inside a transaction block and others not.  CLUSTER is
 one example (maybe the only one?  Not sure).

In that case, it can have one more advantage that all configuration setting
can be done with one command
and in future we might want to have option like BOTH where the command will
take effect for memory as well 
as file.

Can you think of any strong reason why not to have with Alter System
Command?

In any case SET PERSISTENT is fine.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER command reworks

2012-11-19 Thread Dimitri Fontaine

Kohei KaiGai kai...@kaigai.gr.jp writes:
 OK, Are you suggesting to add a generic comments such as Generic
 function to change the name of a given object, for simple cases ...,
 not a list of OBJECT_* at the head of this function, aren't you?

Just something like

 * Most simple objects are covered by a generic function, the other
 * still need special processing.

Mainly I was surprised to see collation not supported here, but I didn't
take enough time to understand why that is the case. I will do that
later in the review process.

 The pain point is AlterObjectNamespace_internal is not invoked by
 only ExecAlterObjectSchemaStmt(), but AlterObjectNamespace_oid()
 also.

I should remember better about that as the use case is extensions…

 It is the reason why I had to put get_object_type() to find out OBJECT_*
 from the supplied ObjectAddress, because this code path does not
 available to pass down its ObjectType from the caller.

If we really want to do that, I think I would only do that in
AlterObjectNamespace_oid and add an argument to
AlterObjectNamespace_internal, that you already have in
ExecAlterObjectSchemaStmt().

But really, what about just removing that part of your patch instead:

@@ -396,14 +614,23 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, 
Oid nspOid)
 * Check for duplicate name (more friendly than unique-index failure).
 * Since this is just a friendliness check, we can just skip it in cases
 * where there isn't a suitable syscache available.
+*
+* XXX - the caller should check object-name duplication, if the 
supplied
+* object type need to take object arguments for identification, such as
+* functions.
 */
-   if (nameCacheId = 0 
-   SearchSysCacheExists2(nameCacheId, name, 
ObjectIdGetDatum(nspOid)))
-   ereport(ERROR,
-   (errcode(ERRCODE_DUPLICATE_OBJECT),
-errmsg(%s already exists in schema \%s\,
-   
getObjectDescriptionOids(classId, objid),
-   get_namespace_name(nspOid;
+   if (get_object_catcache_name(classId) = 0)
+   {
+   ObjectAddress   address;
+
+   address.classId = classId;
+   address.objectId = objid;
+   address.objectSubId = 0;
+
+   objtype = get_object_type(address);
+   check_duplicate_objectname(objtype, nspOid,
+  
NameStr(*DatumGetName(name)), NIL);
+   }

It would be much simpler to retain the old-style duplicate object check,
and compared to doing extra cache lookups, it'd still be much cleaner in
my view.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior

2012-11-19 Thread Tom Lane

Xi Wang xi.w...@gmail.com writes:
 On 11/18/12 6:47 PM, Tom Lane wrote:
 I was against this style of coding before, and I still am.
 For one thing, it's just about certain to introduce conflicts
 against system headers.

 I totally agree.

 I would be happy to rewrite the integer overflow checks without
 using these explicit constants, but it seems extremely tricky to
 do so.

I thought about this some more and realized that we can handle it
by realizing that division by -1 is the same as negation, and so
we can copy the method used in int4um.  So the code would look like

if (arg2 == -1)
{
result = -arg1;
if (arg1 != 0  SAMESIGN(result, arg1))
ereport(ERROR, ...);
PG_RETURN_INT32(result);
}

(with rather more comments than this, of course).  This looks faster
than what's there now, as well as removing the need for use of
explicit INT_MIN constants.

 Compared to (arg1 == INTn_MIN  arg2 == -1), the above check is
 not only more confusing and difficult to understand, but it also
 invokes undefined behavior (-INT_MIN overflow), which is dangerous:
 many C compilers will optimize away the check.

They'd better not, else they'll break many of our overflow checks.
This is why we use -fwrapv with gcc, for example.  Any other compiler
with similar optimizations needs to be invoked with a similar switch.

 Since INTn_MIN and INTn_MAX are standard macros from the C library,
 can we assume that every C compiler should provide them in stdint.h?

Not every C compiler provides stdint.h, unfortunately --- otherwise
I'd not be so resistant to depending on this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gset updated patch

2012-11-19 Thread Dimitri Fontaine

Karl O. Pinc k...@meme.com writes:
 Yes. I'm wrong.  For some reason I thought you could use DO to make
 an anonymous code block that would act as a SETOF function,
 allowing RETURN NEXT expr (et-al) to be used in the
 plpgsql code, allowing DO to return table results.
 (Or, perhaps, instead, be used in place of a table in a SELECT
 statement.)  Oh well.

My key for remembering about that point is that DO is a utility command,
not a query. Now, the proposal I pushed last time we opened that very
can of worms was to have inline functions rather than anonymous code
blocks:

   WITH FUNCTION foo(integer) returns bigint language SQL AS $$
SELECT $1 + 1;
   $$,

Not sure how much that relates to $topic, but still something that
raises in my mind with enough presence that I need to write about it so
that it stops calling for attention :)
   
Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-11-19 Thread Robert Haas

On Thu, Nov 15, 2012 at 3:09 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Please find attached to this email an RFC patch implementing the basics
 of the pg_dump --extension-script option. After much discussion around
 the concept of an inline extension, we decided last year that a good
 first step would be pg_dump support for an extension's script.

 Do you have a link to the original thread?  I have to confess I don't
 remember what the purpose of this was and, heh heh, there are no
 documentation changes in the patch itself either.

 My notes include those links to the original thread:

   http://archives.postgresql.org/message-id/3157.1327298...@sss.pgh.pa.us
   http://archives.postgresql.org/pgsql-hackers/2012-01/msg01311.php
   https://commitfest.postgresql.org/action/patch_view?id=746

 I could of course work on documenting the changes prior to the
 reviewing, the thing is that I've been taking a different implementation
 route towards the pg_dump --extension-script idea we talked about, that
 I think is much simpler than anything else.

 So I'd like to know if that approach is deemed acceptable by the
 Guardians Of The Code before expanding any more hour on this…

 It basically boils down to this hunk in dumpExtension():

   output CREATE EXTENSION x WITH … AS $x$

/*
 * Have another archive for this extension: this allows us to simply
 * walk the extension's dependencies and use the existing pg_dump code
 * to get the object create statement to be added in the script.
 *
 */
eout = CreateArchive(NULL, archNull, 0, archModeAppend);

EH = (ArchiveHandle *) eout;

/* grab existing connection and remote version information */
EH-connection = ((ArchiveHandle *)fout)-connection;
eout-remoteVersion = fout-remoteVersion;

/* dump all objects for this extension, that have been sorted out in
 * the right order following dependencies etc */
...

/* restore the eout Archive into the local buffer */
for (te = EH-toc-next; te != EH-toc; te = te-next)
{
if (strlen(te-defn)  0)
appendPQExpBuffer(q, %s, te-defn);
}
CloseArchive(eout);

   output $x$;

 What do you think?

That approach seems likely to break things for the hoped-for parallel
pg_dump feature, though I'm not sure exactly in what way.

Beyond that, I think much of the appeal of the extension feature is
that it dumps as CREATE EXTENSION hstore; and nothing more.  That
allows you to migrate a dump between systems with different but
compatible versions of the hstore and have things work as intended.
I'm not opposed to the idea of being able to make extensions without
files on disk work ... but I consider it a niche use case; the
behavior we have right now works well for me and hopefully for others
most of the time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior

2012-11-19 Thread Xi Wang

On 11/19/12 11:04 AM, Tom Lane wrote:
 I thought about this some more and realized that we can handle it
 by realizing that division by -1 is the same as negation, and so
 we can copy the method used in int4um.  So the code would look like
 
   if (arg2 == -1)
   {
   result = -arg1;
   if (arg1 != 0  SAMESIGN(result, arg1))
   ereport(ERROR, ...);
   PG_RETURN_INT32(result);
   }
 
 (with rather more comments than this, of course).  This looks faster
 than what's there now, as well as removing the need for use of
 explicit INT_MIN constants.

Hah, I used exactly the same code in my initial patch. :)

See below.

 They'd better not, else they'll break many of our overflow checks.
 This is why we use -fwrapv with gcc, for example.  Any other compiler
 with similar optimizations needs to be invoked with a similar switch.

Yes, that's the scary part.

Even with -fwrapv in gcc (I tried 4.6/4.7/4.8), it still optimizes away
my overflow check!

if (arg1  0  -arg1  0) { ... }

Fortunately, it doesn't optimize way checks using SAMESIGN() for now.
Who knows what could happen in the next version..

Clang's -fwrapv works as expected.

PathScale and Open64 (and probably icc) don't support -fwrapv at all.
Not sure if they have similar options.  They do such optimizations, too.

The reality is that C compilers are not friendly to postcondition
checking; they consider signed integer overflow as undefined behavior,
so they do whatever they want to do.  Even workaround options like
-fwrapv are often broken, not to mention that they may not even have
those options.

That's why I am suggesting to avoid postcondition checking for signed
integer overflow.

 Not every C compiler provides stdint.h, unfortunately --- otherwise
 I'd not be so resistant to depending on this.

Sigh..  You are right.

- xi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER command reworks

2012-11-19 Thread Kohei KaiGai

2012/11/19 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Kohei KaiGai kai...@kaigai.gr.jp writes:
 OK, Are you suggesting to add a generic comments such as Generic
 function to change the name of a given object, for simple cases ...,
 not a list of OBJECT_* at the head of this function, aren't you?

 Just something like

  * Most simple objects are covered by a generic function, the other
  * still need special processing.

 Mainly I was surprised to see collation not supported here, but I didn't
 take enough time to understand why that is the case. I will do that
 later in the review process.

The reason why collation is not supported is that takes special name-
duplication checks. The new collation name must have no collision on
both of current database encoding and any encoding.
It might be an idea to have a simple rule similar to
AlterObjectNamespace_internal(); that requires caller to check
namespace-duplication, if the given object type has no catcache-id
with name-key.

 The pain point is AlterObjectNamespace_internal is not invoked by
 only ExecAlterObjectSchemaStmt(), but AlterObjectNamespace_oid()
 also.

 I should remember better about that as the use case is extensions…

 It is the reason why I had to put get_object_type() to find out OBJECT_*
 from the supplied ObjectAddress, because this code path does not
 available to pass down its ObjectType from the caller.

 If we really want to do that, I think I would only do that in
 AlterObjectNamespace_oid and add an argument to
 AlterObjectNamespace_internal, that you already have in
 ExecAlterObjectSchemaStmt().

 But really, what about just removing that part of your patch instead:

 @@ -396,14 +614,23 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, 
 Oid nspOid)
  * Check for duplicate name (more friendly than unique-index failure).
  * Since this is just a friendliness check, we can just skip it in 
 cases
  * where there isn't a suitable syscache available.
 +*
 +* XXX - the caller should check object-name duplication, if the 
 supplied
 +* object type need to take object arguments for identification, such 
 as
 +* functions.
  */
 -   if (nameCacheId = 0 
 -   SearchSysCacheExists2(nameCacheId, name, 
 ObjectIdGetDatum(nspOid)))
 -   ereport(ERROR,
 -   (errcode(ERRCODE_DUPLICATE_OBJECT),
 -errmsg(%s already exists in schema \%s\,
 -   
 getObjectDescriptionOids(classId, objid),
 -   get_namespace_name(nspOid;
 +   if (get_object_catcache_name(classId) = 0)
 +   {
 +   ObjectAddress   address;
 +
 +   address.classId = classId;
 +   address.objectId = objid;
 +   address.objectSubId = 0;
 +
 +   objtype = get_object_type(address);
 +   check_duplicate_objectname(objtype, nspOid,
 +  
 NameStr(*DatumGetName(name)), NIL);
 +   }

 It would be much simpler to retain the old-style duplicate object check,
 and compared to doing extra cache lookups, it'd still be much cleaner in
 my view.

Now, I get inclined to follow the manner of AlterObjectNamespace_internal
at AlterObjectRename also; that requires caller to check name duplication
in case when no catcache entry is supported.

That simplifies the logic to check name duplication, and allows to eliminate
get_object_type() here, even though RenameAggregate and
RenameFunction have to be remained.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Robert Haas

On Thu, Nov 15, 2012 at 2:44 PM, Jeff Davis pg...@j-davis.com wrote:
 The issue about external utilities is a bigger problem than I realized
 at first. Originally, I thought that it was just a matter of code to
 associate the checksum with the data.

 However, an external utility will never see a torn page while the system
 is online (after recovery); but it *will* see an inconsistent view of
 the checksum and the data if they are issued in separate write() calls.
 So, the hazard of storing the checksum in a different place is not
 equivalent to the existing hazard of a torn page.

I agree that the hazards are not equivalent, but I'm not sure I agree
that an external utility will never see a torn page while the system
is on-line.  We have a bunch of code that essentially forces
full_page_writes=on during a base backup even if it's normally off.  I
think that's necessary precisely because neither the 8kB write() nor
the unknown-sized-read used by the external copy program are
guaranteed to be atomic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2012-11-19 Thread Kevin Grittner

Simon Riggs wrote:

 This seems very similar to the REPLACE command we discussed
 earlier, except this is restricted to Mat Views.

I don't remember that discussion -- do you have a reference?

 If we're going to have this, I would prefer a whole command.
 
 e.g. REPLACE matviewname REFRESH
 
 that would also allow
 
 REPLACE tablename AS query
 
 Same thing under the covers, just more widely applicable and thus
 more useful.

An interesting throught. I would have thought that if we were going
to allow changing the definition of an existing MV, we would be
better off with CREATE OR REPLACE MATERIALIZED VIEW. Either way, if
you allow the column types or the number of columns to be changed,
you do tend to run into issues if there are other MVs, views,
triggers, rules, etc., which depend on the MV, so I don't think it's
material for an initial patch. But it is worth considering which way
we might want to extend it.

 Either way, I don't much like overloading the use of LOAD, which
 already has a very different meaning.

Well, it's hard to avoid creating new keywords without overloading
the meaning of exsiting ones. Personally I didn't find

  LOAD MATERIALIZED VIEW matview_name;

to be very easy to confuse with

  LOAD 'filename';

But that's a subjective thing. If too many people find that
confusing, it may be worth creating a new keyword; but I wanted to
see whether it was really necessary first.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior

2012-11-19 Thread Tom Lane

Xi Wang xi.w...@gmail.com writes:
 The reality is that C compilers are not friendly to postcondition
 checking; they consider signed integer overflow as undefined behavior,
 so they do whatever they want to do.  Even workaround options like
 -fwrapv are often broken, not to mention that they may not even have
 those options.

I think it's probably past time that we stopped guessing about this
sort of thing and added some regression test cases for it.  I'm
planning to add cases like this:

-- check sane handling of INT_MIN overflow cases
SELECT (-2147483648)::int4 * (-1)::int4;
SELECT (-2147483648)::int4 / (-1)::int4;
SELECT (-2147483648)::int4 % (-1)::int4;

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-11-19 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 That approach seems likely to break things for the hoped-for parallel
 pg_dump feature, though I'm not sure exactly in what way.

Will the parallel dump solve the dependencies and extension membership
properties in parallel too?

 Beyond that, I think much of the appeal of the extension feature is
 that it dumps as CREATE EXTENSION hstore; and nothing more.  That
 allows you to migrate a dump between systems with different but
 compatible versions of the hstore and have things work as intended.

Yes. That's the only use case supported so far. The contrib/ use case.

 I'm not opposed to the idea of being able to make extensions without
 files on disk work ... but I consider it a niche use case; the
 behavior we have right now works well for me and hopefully for others
 most of the time.

I hear you. I'm not doing that on my free time, it's not a hobby, I have
customers that want it bad enough to be willing to sponsor my work here.
I hope that helps you figuring about the use case being a niche or not.

The current extension support has been targeted at a single use case,
because that's how you bootstrap that kind of feature. We have request
for extensions that will not include a part written in C.

We've been around the topic last year, we spent much energy trying to
come up with something easy enough to accept as a first step in that
direction, and the conclusion at the time was that we want to be able to
dump an extension's script. That's what my current patch is all about.


More about use cases. Consider PL/Proxy. By the way, that should really
really get in core and be called a FOREIGN FUNCTION, but let's get back
on topic. So I have customers with between 8 and 256 plproxy partitions,
that means each database upgrade has to reach that many databases.

Now, I've built a automatic system that will fetch the PL function code
from the staging database area, put them into files depending on the
schema they live in, package those files into a single one that can be
used by the CREATE EXTENSION command, automatically create an upgrade
file to be able to ALTER EXTENSION … TO VERSION …, and create a bunch of
debian packages out of that (a single debian source package that will
build as many binary packages as we have extensions).

Then, the system will push those packages to an internal repository, run
apt-get update on all the database hosts, then connect to each partition
and run the upgrade command.

All of that could get simplified to getting the PL code into a single
SQL command then running it on all the members of the cluster by using a
plproxy RUN ON ALL command, now that it's a self-contained single SQL
command.

Of course that's only one use case, but that email is already only too
long for what it does: rehashing a story we already ran last year.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] writable foreign tables

2012-11-19 Thread Kohei KaiGai

2012/11/19 Albe Laurenz laurenz.a...@wien.gv.at:
 Kohei KaiGai wrote:
 I am not so happy with GetForeignRelInfo:
 - The name seems ill-chosen from the FDW API side.
   I guess that you chose the name because the function
   is called from get_relation_info, but I think the name
   should be more descriptive for the FDW API.
   Something like PlanForeignRelRowid.

 Indeed, GetForeignRelInfo might give misleading impression
 as if this routine collects widespread information about
 target relation. So, how about GetForeignRelWidth() instead?

 That would be better for the function as it is now.

 - I guess that every FDW that only needs rowid will
   do exactly the same as your fileGetForeignRelInfo.
   Why can't that be done in core?
   The function could pass an AttrNumber for the rowid
   to the FDW, and will receive a boolean return code
   depending on whether the FDW plans to use rowid or not.
   That would be more convenient for FDW authors.

 This design tries to kill two-birds with one-stone.
 It enables to add multiple number of pseudo-columns,
 not only rowid, and makes possible to push-down
 complex calculation of target list into external computing
 resource.

 For example, when user gives the following query:

   SELECT ((c1 - c2) * (c2 - c3))^2 FROM ftable

 it contains a complex calculation in the target-list,
 thus, it also takes CPU cycles of local process.

 If we can replace the ((c1 - c2) * (c2 - c3))^2 by
 a reference to a pseudo-column that also references
 the calculation result on external node, it effectively
 off-load CPU cycles.

 In this case, all we need to do is (1) acquire a slot
 for pseudo-column at GetForeignRelInfo (2) replace
 TargetEntry::expr by Var node that reference this
 pseudo-column.

 It makes sense for performance optimization, so I don't
 want to restrict this handler for rowid only.

 I understand.

 But I think that you still can do that with the change that
 I suggest.  I suggest that GetForeignRelInfo (or whatever the
 name ends up being) gets the AttrNumber of the proposed rowid
 column in addition to the parameters you need for what
 you want to do.

 Then nothing would keep you from defining those
 pseudo-columns.  But all the setup necessary for the rowid
 column could be moved out of the FDW.  So for the 99% of all
 FDW which are only interested in the rowid, things would
 get much simpler and they don't all have to implement the
 same code.

 Did I make clear what I mean?
 Would that be difficult?

All we have to do at get_relation_info() to deal with pseudo-
columns (including alternatives of complex calculation, not
only rowid) is just expansion of rel-max_attr.
So, if FDW is not interested in something except for rowid,
it can just inform the caller Yeah, we need just one slot for
a pseudo-column of rowid. Otherwise, it can return another
value to acquire the slot for arbitrary pseudo-column.
I don't think it is a problematic design.

However, I'm skeptical 99% of FDWs don't support target-list
push-down. At least, it was very desired feature when I had
a talk at PGconf.EU last month. :-)

So, if we rename it to GetForeignRelWidth, is it defined as
follows?

extern AttrNumber
GetForeignRelWidth(PlannerInfo *root,
  RelOptInfo *baserel,
  Oid foreigntableid,
  bool inhparent,
  List *targetList);

Right now, inhparent makes no sense because foreign table
does not support table inheritance, but it seems to me we
shall have this functionality near future.

 - I guess the order is dictated by planner steps, but
   it would be nice to have if GetForeignRelInfo were
   not the first function to be called during planning.
   That would make it easier for a FDW to support both
   9.2 and 9.3 (fewer #ifdefs), because the FDW plan state
   will probably have to be created in the first API
   function.

 The baserel-fdw_private should be initialized to NULL,
 so it can perform as a mark whether private data is already
 constructed, or not.

 Right, if that pointer is pre-initialized to NULL, that
 should work.  Forget my quibble.

 In addition, I noticed my patch didn't update documentation stuff.
 I also add mention about new handlers.

 I didn't get into documentation, comment and spelling issues since
 the patch was still called POC, but yes, eventually that would
 be necessary.

 Yours,
 Laurenz Albe

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2012-11-19 Thread Kevin Grittner

Albe Laurenz wrote:
 Kevin Grittner wrote:
 
 My take was more that MVs would often be refreshed by crontab, and
 that you would want to keep subsequent steps from running and
 generating potentially plausible but completely inaccurate results
 if the LMV failed.

 If one of these subsequent steps doesn't care if refresh
 failed once, it shouldn't be forced to fail. I imagine
 that for many applications yesterday's data can be good enough.
 
 Those that care should check the timestamp.

It sounds like you and I are in agreement on this; I just didn't
state it very precisely. If a LMV on a MV which already has data
fails, the relisvalid would not prevent it from being used -- it
would be stale, but still valid data from *some* point in time. The
point is that if an MV is created WITH NO DATA or has been TRUNCATEd
and there has not been a subsequent LMV, what it contains may not
represent any state which was *ever* valid, or it may represent a
state which would only have been valid hundreds of years in the past,
had the system been computerized at that time. To me, that does not
seem like the same thing as a simple stale state.

I'm looking at whether there is some reasonable way to detect invalid
data as well as capture age of data. Every solution I've thought of
so far has at least one hard-to-solve race condition, but I have
hopes that I can either solve that for one of the ideas, or come up
with an idea which falls more gracefully under MVCC management.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior

2012-11-19 Thread Andres Freund

On 2012-11-19 11:04:31 -0500, Tom Lane wrote:
 Xi Wang xi.w...@gmail.com writes:
  Since INTn_MIN and INTn_MAX are standard macros from the C library,
  can we assume that every C compiler should provide them in stdint.h?

 Not every C compiler provides stdint.h, unfortunately --- otherwise
 I'd not be so resistant to depending on this.

We already have hardcoded values for INT64_MIN in numutils.c's
pg_lltoa(). Given that I think we could just bite the apple and provide
our own values if the platform doesn't provide them.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Jeff Davis

On Mon, 2012-11-19 at 11:48 -0500, Robert Haas wrote:
 I agree that the hazards are not equivalent, but I'm not sure I agree
 that an external utility will never see a torn page while the system
 is on-line.  We have a bunch of code that essentially forces
 full_page_writes=on during a base backup even if it's normally off.  I
 think that's necessary precisely because neither the 8kB write() nor
 the unknown-sized-read used by the external copy program are
 guaranteed to be atomic.

This seems like a standards question that we should be able to answer
definitively:

Is it possible for a reader to see a partial write if both use the same
block size?

Maybe the reason we need full page writes during base backup is because
we don't know the block size of the reader, but if we did know that it
was the same, it would be fine?

If that is not true, then I'm concerned about replicating corruption, or
backing up corrupt blocks over good ones. How do we prevent that? It
seems like a pretty major hole if we can't, because it means the only
safe replication is streaming replication; a base-backup is essentially
unsafe. And it means that even an online background checking utility
would be quite hard to do properly.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Maintenance announcement for trill.postgresql.org

2012-11-19 Thread Stefan Kaltenbrunner

Hi all!


There will be planned hardware/software maintenance this tomorrow
Tuesday (20th November 2012) from starting at 16:30 CET - affecting some
redundant services (ftp and www mirrors) as well as the following public
hosts:

 * yridian.postgresql.org (www.postgresql.eu)
 * antos.postgresql.org (anoncvs.postgresql.org)
 * malur.postgresql.org (mailinglists)

During this maintenance window we will be doing some software and
hardware changes involving a number of reboots and we expect a maximum
outage of an hour.



Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication

2012-11-19 Thread Heikki Linnakangas


On 10.10.2012 17:54, Thom Brown wrote:
 Hmm... I get something different.  When I promote standby B, standby
 C's log shows:

 LOG:  walreceiver ended streaming and awaits new instructions
 LOG:  re-handshaking at position 0/400 on tli 1
 LOG:  fetching timeline history file for timeline 2 from primary server
 LOG:  walreceiver ended streaming and awaits new instructions
 LOG:  new target timeline is 2

 Then when I stop then start standby C I get:

 FATAL:  timeline history was not contiguous
 LOG:  startup process (PID 22986) exited with exit code 1
 LOG:  aborting startup due to startup process failure

Found  fixed this one. A paren was misplaced in tliOfPointInHistory() 
function..


On 16.11.2012 16:01, Amit Kapila wrote:

The following problems are observed while testing of the patch.
Defect-1:

   1. start primary A
   2. start standby B following A
   3. start cascade standby C following B.
   4. Promote standby B.
   5. After successful time line switch in cascade standby C, stop C.
   6. Restart C, startup is failing with the following error.

 LOG:  database system was shut down in recovery at 2012-11-16
16:26:29 IST
 FATAL:  requested timeline 2 does not contain minimum recovery point
0/30143A0 on timeline 1
 LOG:  startup process (PID 415) exited with exit code 1
 LOG:  aborting startup due to startup process failure

The above defect is already discussed in the following link.
http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e7710$@ka
p...@huawei.com


Fixed now, sorry for neglecting this earlier. The problem was that if 
the primary switched to a new timeline at position X, and the standby 
followed that switch, on restart it would set minRecoveryPoint to X, and 
the new



Defect-2:

   1. start primary A
   2. start standby B following A
   3. start cascade standby C following B with 'recovery_target_timeline'
option in
   recovery.conf is disabled.
   4. Promote standby B.
   5. Cascade Standby C is not able to follow the new master B because of
timeline difference.
 6. Try to stop the cascade standby C (which is failing and the
server is not stopping,
   observations are as WAL Receiver process is still running and
clients are not allowing to connect).

The defect-2 is happened only once in my test environment, I will try to
reproduce it.


Found it. When restarting the streaming, I reused the WALRCV_STARTING 
state. But if you then exited recovery, WalRcvRunning() would think that 
the walreceiver is stuck starting up, because it's been longer than 10 
seconds since it was launched and it's still in WALRCV_STARTING state, 
so it put it into WALRCV_STOPPED state. And walreceiver didn't expect to 
be put into STOPPED state after having started up successfully already.


I added a new explicit WALRCV_RESTARTING state to handle that.

In addition to the above bug fixes, there's some small changes since 
last patch version:


* I changed the LOG messages printed in various stages a bit, hopefully 
making it easier to follow what's happening. Feedback is welcome on when 
and how we should log, and whether some error messages need clarification.


* 'ps' display is updated when the walreceiver enters and exits idle mode

* Updated pg_controldata and pg_resetxlog to handle the new 
minRecoveryPointTLI field I added to the control file.


* startup process wakes up walsenders at the end of recovery, so that 
cascading standbys are notified immediately when the timeline changes. 
That removes some of the delay in the process.


- Heikki


streaming-tli-switch-7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Andres Freund

On 2012-11-19 09:22:45 -0800, Jeff Davis wrote:
 On Mon, 2012-11-19 at 11:48 -0500, Robert Haas wrote:
  I agree that the hazards are not equivalent, but I'm not sure I agree
  that an external utility will never see a torn page while the system
  is on-line.  We have a bunch of code that essentially forces
  full_page_writes=on during a base backup even if it's normally off.  I
  think that's necessary precisely because neither the 8kB write() nor
  the unknown-sized-read used by the external copy program are
  guaranteed to be atomic.

 This seems like a standards question that we should be able to answer
 definitively:

 Is it possible for a reader to see a partial write if both use the same
 block size?

Yes, definitely.

 If that is not true, then I'm concerned about replicating corruption, or
 backing up corrupt blocks over good ones. How do we prevent that? It
 seems like a pretty major hole if we can't, because it means the only
 safe replication is streaming replication; a base-backup is essentially
 unsafe. And it means that even an online background checking utility
 would be quite hard to do properly.

I am not sure I see the danger in the base backup case here? Why would
we have corrupted backup blocks? While postgres is running we won't see
such torn pages because its all done under proper locks...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY

2012-11-19 Thread Robert Haas

On Thu, Nov 15, 2012 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Yeah.  If we're going to do this at all, and I'm not convinced it's
 worth the work, I think it's definitely good to support a variant
 where we specify exactly the things that will be passed to exec().
 There's just too many ways to accidentally shoot yourself in the foot
 otherwise.  If we want to have an option that lets people shoot
 themselves in the foot, that's fine.  But I think we'd be smart not to
 make that the only option.

 [ shrug... ]  Once again, that will turn this from a ten-line patch
 into hundreds of lines (and some more, different, hundreds of lines
 for Windows I bet), with a corresponding growth in the opportunities
 for bugs, for a benefit that's at best debatable.

 The biggest problem this patch has had from the very beginning is
 overdesign, and this is more of the same.  Let's please just define the
 feature as popen, not fopen, the given string and have done.

I just don't agree with that.  popen() is to security holes as cars
are to alcohol-related fatalities.  In each case, the first one
doesn't directly cause the second one; but it's a pretty darn powerful
enabler.  Your proposed solution won't force people to write insecure
applications; it'll just make it much more likely that they will do so
... after which, presumably, you'll tell them it's their own darn
fault for using the attractive nuisance.  The list of security
vulnerabilities that are the result of insufficiently careful
validation of strings passed to popen() is extremely long.  If we give
people a feature that can only be leveraged via popen(), the chances
that someone will thereby open a security hole are indistinguishable
from 1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2012-11-19 Thread Tom Lane

Kevin Grittner kgri...@mail.com writes:
 Simon Riggs wrote:
 Either way, I don't much like overloading the use of LOAD, which
 already has a very different meaning.

 Well, it's hard to avoid creating new keywords without overloading
 the meaning of exsiting ones.

FWIW, I'd much rather see us overload LOAD (which is seldom used)
than REPLACE (which might in the future become a widely-used DML
command).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY

2012-11-19 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 15, 2012 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The biggest problem this patch has had from the very beginning is
 overdesign, and this is more of the same.  Let's please just define the
 feature as popen, not fopen, the given string and have done.

 ... If we give
 people a feature that can only be leveraged via popen(), the chances
 that someone will thereby open a security hole are indistinguishable
 from 1.

You are absolutely right that this feature is a security risk, but it
will be one whether it exposes popen() or only exec().  I do not believe
that the incremental gain in security from disallowing shell notation
is worth either the loss of functionality or the amount of added effort
(and added bugs, some of which will be security issues in themselves)
we'd need to write it that way.

The correct response to the security risks is to (a) make it
superuser-only and (b) document that it's a seriously bad idea to allow
the argument string to come from any untrusted sources.  Please note
that we'd have to do these same things with an exec-based patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2012-11-19 Thread Josh Berkus

Kevin,

 I'm looking at whether there is some reasonable way to detect invalid
 data as well as capture age of data. Every solution I've thought of
 so far has at least one hard-to-solve race condition, but I have
 hopes that I can either solve that for one of the ideas, or come up
 with an idea which falls more gracefully under MVCC management.

What's the race condition?  I'd think that LOAD would take an exclusive
lock on the matview involved.

   LOAD MATERIALIZED VIEW matview_name;

 to be very easy to confuse with

   LOAD 'filename';

 But that's a subjective thing. If too many people find that
 confusing, it may be worth creating a new keyword; but I wanted to
 see whether it was really necessary first.

I do not find them confusing.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2012-11-19 Thread Josh Berkus

On 11/19/12 9:57 AM, Josh Berkus wrote:
 Kevin,
 
 I'm looking at whether there is some reasonable way to detect invalid
 data as well as capture age of data. Every solution I've thought of
 so far has at least one hard-to-solve race condition, but I have
 hopes that I can either solve that for one of the ideas, or come up
 with an idea which falls more gracefully under MVCC management.
 
 What's the race condition?  I'd think that LOAD would take an exclusive
 lock on the matview involved.

BTW, another thought on the timestamp: while it would be better to have
a lastrefresh timestamp in pg_class, the other option is to have an
extra column in the matview (pg_last_update).  While that would involve
some redundant storage, it would neatly solve the issues around unlogged
matviews; the timestamp and the data would vanish at the same time.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we need so many hint bits?

2012-11-19 Thread Atri Sharma

On Sun, Nov 18, 2012 at 10:43 PM, Jeff Davis pg...@j-davis.com wrote:

 On Sun, 2012-11-18 at 15:19 +0100, Andres Freund wrote:
  On Sunday, November 18, 2012 03:07:01 AM Jeff Davis wrote:
   Process A (process that clears a VM bit for a data page):
 1. Acquires exclusive lock on data buffer
 2. Acquires exclusive lock on VM buffer
 3. clears VM bit
 4. Releases VM buffer lock
 5. Releases data buffer lock
 
  Well, but right this is a rather big difference. If vm pages get
  unconditionally locked all the time we will have a huge source of new
  contention as they are shared between so many heap pages.

 No, that is only for the process *clearing* the bit, and this already
 happens. I am not planning on introducing any new locks, aside from the
 buffer header lock when acquiring a pin. And I plan to keep those pins
 for long enough that those don't matter, either.

 Regards,
 Jeff Davis



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


Sorry If I am being a bit naive, but shouldnt a simple mutex work in the
case when a process wants to change the VM bit in cache?

Mutex would be cheaper than locks.

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Do we need so many hint bits?

2012-11-19 Thread Atri Sharma


 It's quite common to load a lot of data, and then do some reads for a
 while (setting hint bits and flushing them to disk), and then do a
 VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the
 pages again. Also, if I remember correctly, Robert went to significant
 effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM
 bits consistent. Maybe this was all discussed before?

 All of these hint bits will have a bit more of a performance impact
 after checksums are introduced (for those that use them in conjunction
 with large data loads), so I'm looking for some simple ways to mitigate
 those effects. What kind of worst-case tests could I construct to see if
 there are worrying performance effects to removing these hint bits?

 Regards,
 Jeff Davis




I completely agree.In fact, that is the problem that we are trying to solve
in our patch(https://commitfest.postgresql.org/action/patch_view?id=991).
Essentially, we are trying to mitigate the expense of maintaining hint bits
in the cases when the user loads a lot of data, does some operations such
as SELECT, and deletes them all.We maintain a cache that can be used to
fetch the commit status of XMAX or XMIN instead of hint bits.As the cache
is single frame, it has no issues in replacement algorithm.Cache lookup is
pretty cheap.

I agree with the removal of PD_ALL_VISIBLE.AFAIK(pardon me if I am wrong, I
have been trying to research while following this thread), PD_AL_VISIBLE
was really useful when VM bits were not really safe, and crashes could lead
to redo setting the bit on the heap pages.

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Jeff Davis

On Mon, 2012-11-19 at 18:30 +0100, Andres Freund wrote:
 Yes, definitely.

OK. I suppose that makes sense for large writes.

  If that is not true, then I'm concerned about replicating corruption, or
  backing up corrupt blocks over good ones. How do we prevent that? It
  seems like a pretty major hole if we can't, because it means the only
  safe replication is streaming replication; a base-backup is essentially
  unsafe. And it means that even an online background checking utility
  would be quite hard to do properly.
 
 I am not sure I see the danger in the base backup case here? Why would
 we have corrupted backup blocks? While postgres is running we won't see
 such torn pages because its all done under proper locks...

Yes, the blocks written *after* the checkpoint might have a bad checksum
that will be fixed during recovery. But the blocks written *before* the
checkpoint should have a valid checksum, but if they don't, then
recovery doesn't know about them.

So, we can't verify the checksums in the base backup because it's
expected that some blocks will fail the check, and they can be fixed
during recovery. That gives us no protection for blocks that were truly
corrupted and written long before the last checkpoint.

I suppose if we could somehow differentiate the blocks, that might work.
Maybe look at the LSN and only validate blocks written before the
checkpoint? But of course, that's a problem because a corrupt block
might have the wrong LSN (in fact, it's likely, because garbage is more
likely to make the LSN too high than too low).

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] too much pgbench init output

2012-11-19 Thread Tomas Vondra

On 19.11.2012 11:59, Jeevan Chalke wrote:
 Hi,
 
 I gone through the discussion for this patch and here is my review:
 
 The main aim of this patch is to reduce the number of log lines. It is
 also suggested to use an options to provide the interval but few of us
 are not much agree on it. So final discussion ended at keeping 5 sec
 interval between each log line.
 
 However, I see, there are two types of users here:
 1. Who likes these log lines, so that they can troubleshoot some
 slowness and all
 2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

 So keeping these in mind, I rather go for an option which will control
 this. People falling in category one can set this option to very low
 where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the interval
length or something else?

A switch that'd choose between the old and new behavior might be a good
idea, but I'd strongly vote against automagic heuristics. It makes the
behavior very difficult to predict and I really don't want to force the
users to wonder whether the long delay is due to general slowness of the
machine or some clever logic that causes long delays between log messages.

That's why I choose a very simple approach with constant time interval.
It does what I was aiming for (less logs) and it's easy to predict.
Sure, we could choose different interval (or make it an option).

 However, assuming we settled on 5 sec delay, here are few comments on
 that patch attached:
 
 Comments:
 =
 
 Patch gets applied cleanly with some whitespace errors. make and make
 install too went smooth.
 make check was smooth. Rather it should be smooth since there are NO
 changes in other part of the code rather than just pgbench.c and we do
 not have any test-case as well.
 
 However, here are few comments on changes in pgbench.c
 
 1.
 Since the final discussion ended at keeping a 5 seconds interval will be
 good enough, Author used a global int variable for that.
 Given that it's just a constant, #define would be a better choice.

Good point. Although if we add an option to supply different values, a
variable is a better match.

 2.
 +/* let's not call the timing for each row, but only each 100
 rows */
 Why only 100 rows ? Have you done any testing to come up with number 100
 ? To me it seems very low. It will be good to test with 1K or even 10K.
 On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in
 VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
 checking every 100 rows looks overkill.

Well, the 100 is clearly a magical constant. The goal was to choose a
value large enough to amortize the getlocaltime() cost, but small enough
to print info even in cases when the performance sucks for some reason.

I've seen issues where the speed suddenly dropped to a fraction of the
expected value, e.g. 100 rows/second, and in those cases you'd have to
wait for a very long time to actually get the next log message (with the
suggested 10k step).

So 100 seems like a good compromise to me ...

 3.
 Please indent following block as per the indentation just above that
 
 /* used to track elapsed time and estimate of the remaining time */
 instr_timestart, diff;
 double elapsed_sec, remaining_sec;
 int log_interval = 1;

OK

 4.
 +/* have ve reached the next interval? */
 Do you mean have WE reached...

OK

 
 5.
 While applying a patch, I got few white-space errors. But I think every
 patch goes through pgindent which might take care of this.

OK

 
 Thanks

Thanks for the review. I'll wait for a bit more discussion about the
choices before submitting another version of the patch.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we need so many hint bits?

2012-11-19 Thread Atri Sharma

On Tue, Nov 20, 2012 at 12:08 AM, Jeff Davis pg...@j-davis.com wrote:

 On Mon, 2012-11-19 at 23:50 +0530, Atri Sharma wrote:

  Sorry If I am being a bit naive, but shouldnt a simple mutex work in
  the case when a process wants to change the VM bit in cache?
 
  Mutex would be cheaper than locks.
 
 I thought mutexes are locks?

 The point is to avoid taking new locks (or mutexes) during a read of the
 VM bit, because there is concern that it could cause contention. If we
 lock the entire VM page, that represents many, many data pages, so it's
 worrisome.

 Regards,
 Jeff Davis



My mistake...I thought we were more concerned about the cost of locking.

I agree, locking many data pages simultaneously could be hazardous.

This requires more thought.Maybe removing PD_ALL_VISIBLE isnt such a great
idea after all...

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Do we need so many hint bits?

2012-11-19 Thread Jeff Davis

On Mon, 2012-11-19 at 23:50 +0530, Atri Sharma wrote:

 Sorry If I am being a bit naive, but shouldnt a simple mutex work in
 the case when a process wants to change the VM bit in cache?
 
 Mutex would be cheaper than locks.
 
I thought mutexes are locks?

The point is to avoid taking new locks (or mutexes) during a read of the
VM bit, because there is concern that it could cause contention. If we
lock the entire VM page, that represents many, many data pages, so it's
worrisome.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we need so many hint bits?

2012-11-19 Thread Jeff Davis

On Tue, 2012-11-20 at 00:13 +0530, Atri Sharma wrote:

 My mistake...I thought we were more concerned about the cost of
 locking.
 
 I agree, locking many data pages simultaneously could be hazardous.
 
 This requires more thought.Maybe removing PD_ALL_VISIBLE isnt such a
 great idea after all...

As I said elsewhere in the thread, I'm not planning to introduce any
additional locking. There is already precedent in IndexOnlyNext.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we need so many hint bits?

2012-11-19 Thread Andres Freund

On 2012-11-19 10:46:37 -0800, Jeff Davis wrote:
 On Tue, 2012-11-20 at 00:13 +0530, Atri Sharma wrote:
 
  My mistake...I thought we were more concerned about the cost of
  locking.
  
  I agree, locking many data pages simultaneously could be hazardous.
  
  This requires more thought.Maybe removing PD_ALL_VISIBLE isnt such a
  great idea after all...
 
 As I said elsewhere in the thread, I'm not planning to introduce any
 additional locking. There is already precedent in IndexOnlyNext.

Yea, but that precedent is only valid because IOS does check for a MVCC
snapshots. Also it doesn't set anything.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we need so many hint bits?

2012-11-19 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 As I said elsewhere in the thread, I'm not planning to introduce any
 additional locking. There is already precedent in IndexOnlyNext.

Of course, that just begs the question of whether the code in
IndexOnlyNext is correct.  And after looking at it, it seems pretty
broken, or at least the argument for its correctness is broken.
That comment seems to consider only the case where the target tuple has
just been inserted --- but what of the case where the target tuple is in
process of being deleted?  The deleter will not make a new index entry
for that.  Of course, there's a pretty long code path from marking a
tuple deleted to committing the deletion, and it seems safe to assume
that the deleter will hit a write barrier in that path.  But I don't
believe the argument here that the reader must hit a read barrier in
between.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-19 Thread Jeff Janes

On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote:

 Is turning off synchronous_commit enough?  What about turning off fsync?

 I did some testing with the attached patch on a magnetic disk with no
 BBU that turns off fsync;

With which file system? I wouldn't expect you to see a benefit with
ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits
group fsync of new file creations but rather does each one serially.
 Whether it is worth applying a fix that is only needed for that one
file system, I don't know.  The trade-offs are not all that clear to
me yet.

  I got these results

  sync_com=off  fsync=off
 115.90 13.51
  100026.09 24.56
  200033.41 31.20
  400057.39 57.74
  8000   102.84116.28
 16000   189.43207.84

 It shows fsync faster for  4k, and slower for  4k.  Not sure why this
 is the cause but perhaps the buffering of the fsync is actually faster
 than doing a no-op fsync.

synchronous-commit=off turns off not only the fsync at each commit,
but also the write-to-kernel at each commit; so it is not surprising
that it is faster at large scale.  I would specify both
synchronous-commit=off and fsync=off.


 When I'm doing a pg_upgrade with thousands of tables, the shutdown
 checkpoint after restoring the dump to the new cluster takes a very
 long time, as the writer drains its operation table by opening and
 individually fsync-ing thousands of files.  This takes about 40 ms per
 file, which I assume is a combination of slow lap-top disk drive, and
 a strange deal with ext4 which makes fsyncing a recently created file
 very slow.   But even with faster hdd, this would still be a problem
 if it works the same way, with every file needing 4 rotations to be
 fsynced and this happens in serial.

 Is this with the current code that does synchronous_commit=off?  If not,
 can you test to see if this is still a problem?

Yes, it is with synchronous_commit=off. (or if it wasn't originally,
it is now, with the same result)

Applying your fsync patch does solve the problem for me on ext4.
Having the new cluster be on ext3 rather than ext4 also solves the
problem, without the need for a patch; but it would be nice to more
friendly to ext4, which is popular even though not recommended.


 Anyway, the reason I think turning fsync off might be reasonable is
 that as soon as the new cluster is shut down, pg_upgrade starts
 overwriting most of those just-fsynced file with other files from the
 old cluster, and AFAICT makes no effort to fsync them.  So until there
 is a system-wide sync after the pg_upgrade finishes, your new cluster
 is already in mortal danger anyway.

 pg_upgrade does a cluster shutdown before overwriting those files.

Right.  So as far as the cluster is concerned, those files have been
fsynced.  But then the next step is go behind the cluster's back and
replace those fsynced files with different files, which may or may not
have been fsynced.  This is what makes me thing the new cluster is in
mortal danger.  Not only have the new files perhaps not been fsynced,
but the cluster is not even aware of this fact, so you can start it
up, and then shut it down, and it still won't bother to fsync them,
because as far as it is concerned they already have been.

Given that, how much extra danger would be added by having the new
cluster schema restore run with fsync=off?

In any event, I think the documentation should caution that the
upgrade should not be deemed to be a success until after a system-wide
sync has been done.  Even if we use the link rather than copy method,
are we sure that that is safe if the directories recording those links
have not been fsynced?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doc patch making firm recommendation for setting the value of commit_delay

2012-11-19 Thread Peter Geoghegan

I've added this to the open commitfest. I think that a formal review
is probably required here.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-11-19 Thread Alvaro Herrera

Kohei KaiGai escribió:
 2012/10/22 Alvaro Herrera alvhe...@2ndquadrant.com:
  Here's an updated version of this patch, which also works in
  an EXEC_BACKEND environment.  (I haven't tested this at all on Windows,
  but I don't see anything that would create a portability problem there.)
 
 I also tried to check the latest patch briefly.
 Let me comment on several points randomly.
 
 Once bgworker process got crashed, postmaster tries to restart it about 60
 seconds later. My preference is this interval being configurable with initial
 registration parameter; that includes never restart again option.
 Probably, some extensions don't want to restart again on unexpected crash.

I changed this.

 Stop is one other significant event for bgworker, not only start-up.
 This interface has no way to specify when we want to stop bgworker.
 Probably, we can offer two options. Currently, bgworkers are simultaneously
 terminated with other regular backends. In addition, I want an option to make
 sure bgworker being terminated after all the regular backends exit.
 It is critical issue for me, because I try to implement parallel
 calculation server
 with bgworker, thus, it should not be terminated earlier than regular backend.

I am not really sure about this.  Each new behavior we want to propose
requires careful attention, because we need to change postmaster's
shutdown sequence in pmdie(), and also reaper() and
PostmasterStateMachine().  After looking into it, I am hesitant to
change this too much unless we can reach a very detailed agreement of
exactly what we want to happen.

Also note that there are two types of workers: those that require a
database connection, and those that do not.  The former are signalled
via SignalSomeChildren(BACKEND_TYPE_BGWORKER); the latter are signalled
via SignalUnconnectedWorker().  Would this distinction be enough for
you?

Delivering more precise shutdown signals is tricky; we would have to
abandon SignalSomeChildren() and code something new to scan the workers
list checking the stop time for each one.  (Except in emergency
situations, of course, in which we would continue to rely on
SignalSomeChildren).

 How about to move bgw_name field into tail of BackgroundWorker structure?
 It makes simplifies the logic in RegisterBackgroundWorker(), if it is
 located as:
 typedef struct BackgroundWorker
 {
 int bgw_flags;
 BgWorkerStartTime bgw_start_time;
 bgworker_main_type  bgw_main;
 void   *bgw_main_arg;
 bgworker_sighdlr_type bgw_sighup;
 bgworker_sighdlr_type bgw_sigterm;
 charbgw_name[1];  == (*)
 } BackgroundWorker;

This doesn't work; or rather, it works and it makes
RegisterBackgroundWorker() code somewhat nicer, but according to Larry
Wall's Conservation of Cruft Principle, the end result is that the
module code to register a new worker is a lot messier; it can no longer
use a struct in the stack, but requires malloc() or similar.  I don't
see that this is a win overall.

 StartOneBackgroundWorker always scan the BackgroundWorkerList from
 the head. Isn't it available to save the current position at static variable?
 If someone tries to manage thousand of bgworkers, it makes a busy loop. :(

Seems messy; we would have to get into the guts of slist_foreach (unroll
the macro and make the iterator static).  I prefer not to go that path,
at least not for now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2012-11-19 Thread Kevin Grittner

Tom Lane wrote:
 Kevin Grittner kgri...@mail.com writes:
 Josh Berkus wrote:
 What use would a temporary matview be?
 
 It would be essentially like a temporary table, with all the same
 persistence options. I'm not really sure how often it will be more
 useful than a temporary table before we have incremental
 maintenance of materialized views; once we have that, though, it
 seems likely that there could be reasonable use cases.
 
 One of the principal attributes of a temp table is that its
 contents aren't (reliably) accessible from anywhere except the
 owning backend. Not sure where you're going to hide the incremental
 maintenance in that scenario.

The more I think about that, the less sensible temporary MVs seem.
Unless I can figure out some reasonable use case, I'll diable that in
the next version of the patch.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-11-19 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Kohei KaiGai escribió:
 StartOneBackgroundWorker always scan the BackgroundWorkerList from
 the head. Isn't it available to save the current position at static variable?
 If someone tries to manage thousand of bgworkers, it makes a busy loop. :(

 Seems messy; we would have to get into the guts of slist_foreach (unroll
 the macro and make the iterator static).  I prefer not to go that path,
 at least not for now.

Thousands of bgworkers seems like a pretty unsupportable scenario anyway
--- it'd presumably have most of the same problems as thousands of
backends.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Jeff Davis

On Mon, 2012-11-19 at 10:35 -0800, Jeff Davis wrote:
 Yes, the blocks written *after* the checkpoint might have a bad checksum
 that will be fixed during recovery. But the blocks written *before* the
 checkpoint should have a valid checksum, but if they don't, then
 recovery doesn't know about them.
 
 So, we can't verify the checksums in the base backup because it's
 expected that some blocks will fail the check, and they can be fixed
 during recovery. That gives us no protection for blocks that were truly
 corrupted and written long before the last checkpoint.
 
 I suppose if we could somehow differentiate the blocks, that might work.
 Maybe look at the LSN and only validate blocks written before the
 checkpoint? But of course, that's a problem because a corrupt block
 might have the wrong LSN (in fact, it's likely, because garbage is more
 likely to make the LSN too high than too low).

It might be good enough here to simply retry the checksum verification
if it fails for any block. Postgres shouldn't be issuing write()s for
the same block very frequently, and they shouldn't take very long, so
the chances of failing several times seems vanishingly small unless it's
a real failure.

Through a suitably complex mechanism, I think we can be more sure. The
external program could wait for a checkpoint (or force one manually),
and then recalculate the checksum for that page. If checksum is the same
as the last time, then we know the block is bad (because the checkpoint
would have waited for any writes in progress). If the checksum does
change, then we assume postgres must have modified it since the backup
started, so we can assume that we have a full page image to fix it. (A
checkpoint is a blunt tool here, because all we need to do is wait for
the write() call to finish, but it suffices.)

That complexity is probably not required, and simply retrying a few
times is probably much more practical. But it still bothers me a little
to think that the external tool could falsely indicate a checksum
failure, however remote that chance.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-11-19 Thread Alexander Korotkov

Hi!

New version of patch is attached. Changes are following:
1) Right way to convert from pg_wchar to multibyte.
2) Optimization of producing CFNA-like graph on trigrams (produce smaller,
but equivalent, graphs in less time).
3) Comments and refactoring.

--
With best regards,
Alexander Korotkov.


trgm-regexp-0.2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] OAT_POST_ALTER object access hooks

2012-11-19 Thread Alvaro Herrera

Kohei KaiGai wrote:
 Sorry, I missed the attached version.
 Please use this revision.

All those direct uses of object_access_hook make me think that the
InvokeObjectAccessHook() macro we have is insufficient.  Maybe we could
have InvokeObjectAccessHookArg1() so that instead of

+   if (object_access_hook)
+   {
+   ObjectAccessPostCreate  pc_arg;
+
+   memset(pc_arg, 0, sizeof(ObjectAccessPostCreate));
+   pc_arg.is_internal = is_internal;
+   (*object_access_hook)(OAT_POST_CREATE, AttrDefaultRelationId,
+ RelationGetRelid(rel), attnum, (void *)pc_arg);
+   }

we can have

InvokeObjectAccessHookWithArgs1(OAT_POST_CREATE, AttrDefaultRelationId,
RelationGetRelid(rel), attnum,
ObjectAccessPostCreate, is_internal, 
is_internal)

which would expand into the above.  (Since ObjectAccessPostCreate has
two members, we presumably need InvokeObjectAccessHookWithArgs2 as
well.  But that doesn't seem to be used anywhere, so maybe that struct
member is not necessary?)

The second hunk to alter.c does not apply anymore; please rebase.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-11-19 Thread Erik Rijkers

On Mon, November 19, 2012 22:58, Alexander Korotkov wrote:

 New version of patch is attached.

Hi Alexander,

I get some compile-errors:

(Centos 6.3, Linux 2.6.32-279.14.1.el6.x86_64 GNU/Linux, gcc (GCC) 4.7.2)

make contrib
trgm_regexp.c:73:2: error: unknown type name TrgmStateKey
make[1]: *** [trgm_regexp.o] Error 1
make: *** [all-pg_trgm-recurse] Error 2
trgm_regexp.c:73:2: error: unknown type name TrgmStateKey
make[1]: *** [trgm_regexp.o] Error 1
make: *** [install-pg_trgm-recurse] Error 2


Did I forget something?

Thanks,

Erik Rijkers







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-11-19 Thread Tomas Vondra

On 19.11.2012 22:58, Alexander Korotkov wrote:
 Hi!
 
 New version of patch is attached. Changes are following:
 1) Right way to convert from pg_wchar to multibyte.
 2) Optimization of producing CFNA-like graph on trigrams (produce
 smaller, but equivalent, graphs in less time).
 3) Comments and refactoring.

Hi,

thanks for the updated message-id. I've done the initial review:

1) Patch applies fine to the master.

2) It's common to use upper-case names for macros, but trgm.h defines
   macro iswordchr - I see it's moved from trgm_op.c but maybe we
   could make it a bit more correct?

3) I see there are two '#ifdef KEEPONLYALNUM blocks right next to each
   other in trgm.h - maybe it'd be a good idea to join them?

4) The two new method prototypes at the end of trgm.h use different
   indendation than the rest (spaces only instead of tabs).

5) There are no regression tests / updated docs (yet).

6) It does not compile - I do get a bunch of errors like this

trgm_regexp.c:73:2: error: expected specifier-qualifier-list before
‘TrgmStateKey’
trgm_regexp.c: In function ‘addKeys’:
trgm_regexp.c:291:24: error: ‘TrgmState’ has no member named ‘keys’
trgm_regexp.c:304:10: error: ‘TrgmState’ has no member named ‘keys’
...

It seems this is cause by the order of typedefs in trgm_regexp.c, namely
TrgmState referencing TrgmStateKey before it's defined. Moving the
TrgmStateKey before TrgmState fixed the issue (I'm using gcc-4.5.4  but
I think it's not compiler-dependent.)

7) Once fixed, it seems to work

CREATE EXTENSION pg_trgm ;
CREATE TABLE TEST (val TEXT);
INSERT INTO test
   SELECT md5(i::text) FROM generate_series(1,100) s(i);
CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
ANALYZE test;

EXPLAIN SELECT * FROM test WHERE val ~ '.*qqq.*';


   QUERY PLAN
-
 Bitmap Heap Scan on test  (cost=16.77..385.16 rows=100 width=33)
   Recheck Cond: (val ~ '.*qqq.*'::text)
   -  Bitmap Index Scan on trgm_idx  (cost=0.00..16.75 rows=100
   width=0)
 Index Cond: (val ~ '.*qqq.*'::text)
(4 rows)

but I do get a bunch of NOTICE messages with debugging info (no matter
if the GIN index is used or not, so it's somewhere in the common regexp
code). But I guess that's due to WIP status.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-11-19 Thread Alexander Korotkov

Some quick comments.

On Tue, Nov 20, 2012 at 3:02 AM, Tomas Vondra t...@fuzzy.cz wrote:

 6) It does not compile - I do get a bunch of errors like this

Fixed.

7) Once fixed, it seems to work

 CREATE EXTENSION pg_trgm ;
 CREATE TABLE TEST (val TEXT);
 INSERT INTO test
SELECT md5(i::text) FROM generate_series(1,100) s(i);
 CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
 ANALYZE test;

 EXPLAIN SELECT * FROM test WHERE val ~ '.*qqq.*';


QUERY PLAN
 -
  Bitmap Heap Scan on test  (cost=16.77..385.16 rows=100 width=33)
Recheck Cond: (val ~ '.*qqq.*'::text)
-  Bitmap Index Scan on trgm_idx  (cost=0.00..16.75 rows=100
width=0)
  Index Cond: (val ~ '.*qqq.*'::text)
 (4 rows)

 but I do get a bunch of NOTICE messages with debugging info (no matter
 if the GIN index is used or not, so it's somewhere in the common regexp
 code). But I guess that's due to WIP status.

It's due to TRGM_REGEXP_DEBUG macro. I disabled it by default. But I think
pieces of code hidden by that macro could be useful for debug even after
WIP status.

--
With best regards,
Alexander Korotkov.


trgm-regexp-0.3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-11-19 Thread Peter Eisentraut

On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote:
 But, with the attached patch:
 
 rhaas=# create function xyz(smallint) returns smallint as $$select
 $1$$ language sql;
 CREATE FUNCTION
 rhaas=# select xyz(5);
  xyz
 -
5
 (1 row)
 
 rhaas=# create table abc (a int);
 CREATE TABLE
 rhaas=# select lpad(a, 5, '0') from abc;
  lpad
 --
 (0 rows)

I continue to be of the opinion that allowing this second case to work
is not desirable.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v3

2012-11-19 Thread Michael Paquier

On Mon, Nov 19, 2012 at 5:50 PM, Andres Freund and...@2ndquadrant.comwrote:

 Hi Michael,


 On 2012-11-19 16:28:55 +0900, Michael Paquier wrote:
  I have been able to fetch your code (thanks Andrea!) and some it. For the
  time being I am spending some time reading the code and understanding the
  whole set of features you are trying to implement inside core, even if I
  got some background from what you presented at PGCon and from the hackers
  ML.

 Cool.

  Btw, as a first approach, I tried to run the logical log receiver plugged
  on a postgres server, and I am not able to make it work.

  Well, I am using settings similar to yours.
  # Run master
  rm -r ~/bin/pgsql/master/
  initdb -D ~/bin/pgsql/master/
  echo local replication $USER trust  ~/bin/pgsql/master/pg_hba.conf
  postgres -D ~/bin/pgsql/master \
-c wal_level=logical \
-c max_wal_senders=10 \
-c max_logical_slots=10 \
-c wal_keep_segments=100 \
-c log_line_prefix=[%p %x] 
  # Logical log receiver
  pg_receivellog -f $HOME/output.txt -d postgres -v
 
  After launching some SQLs, the logical receiver is stuck just after
 sending
  INIT_LOGICAL_REPLICATION, please see bt of process waiting:

 Its waiting till it sees initial an initial xl_running_xacts record. The
 easiest way to do that is manually issue a checkpoint. Sorry, should
 have included that in the description.
 Otherwise you can wait till the next routine checkpoint comes arround...

 I plan to cause more xl_running_xacts record to be logged in the
 future. I think the timing of those currently is non-optimal, you have
 the same problem as here in normal streaming replication as well :(

  I am just looking at this patch and will provide some comments.
  By the way, you forgot the installation part of pg_receivellog, please
 see
  patch attached.

 That actually was somewhat intended, I thought people wouldn't like the
 name and I didn't want a binary that's going to be replaced anyway lying
 around ;)

OK no problem. For sure this is going to happen, I was wondering myself if
it could be possible to merge pg_receivexlog and pg_receivellog into a
single utility with multiple modes :)

Btw, here are some extra comments based on my progress, hope it will be
useful for other people playing around with your patches.
1) Necessary to install the contrib module test_decoding on server side or
the test case will not work.
2) Obtention of the following logs on server:
LOG:  forced to assume catalog changes for xid 1370 because it was running
to early
WARNING:  ABORT 1370
Actually I saw that there are many warnings like this.
3) Assertion failure while running pgbench, I was  just curious to see how
it reacted when logical replication was put under a little bit of load.
TRAP: FailedAssertion(!(((xid) = ((TransactionId) 3)) 
((snapstate-xmin_running) = ((TransactionId) 3))), File: snapbuild.c,
Line: 877)
= pgbench -i postgres; pgbench -T 500 -c 8 postgres
4) Mentionned by Andres above, but logical replication begins only there is
a xl_running_xacts record. I just enforced a checkpoint manually.

With all those things done, I have been able to set up the system, for
example those queries:
postgres=# create table ac (a int);
CREATE TABLE
postgres=# insert into ac values (1);
INSERT 0 1
created the expected output:
BEGIN 32135
COMMIT 32135
BEGIN 32136
table ac: INSERT: a[int4]:1
COMMIT 32136

Now it is time to dig into the code...
-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] logical changeset generation v3 - Source for Slony

2012-11-19 Thread Steve Singer


On 12-11-18 11:07 AM, Andres Freund wrote:

Hi Steve!


I think we should provide some glue code to do this, otherwise people
will start replicating all the bugs I hacked into this... More
seriously: I think we should have support code here, no user will want
to learn the intracacies of feedback messages and such. Where that would
live? No idea.


libpglogicalrep.so ?


I wholeheartedly aggree. It should also be cleaned up a fair bit before
others copy it should we not go for having some client side library.

Imo the library could very roughly be something like:

state = SetupStreamingLLog(replication-slot, ...);
while((message = StreamingLLogNextMessage(state))
{
  write(outfd, message-data, message-length);
  if (received_100_messages)
  {
   fsync(outfd);
   StreamingLLogConfirm(message);
  }
}

Although I guess thats not good enough because StreamingLLogNextMessage
would be blocking, but that shouldn't be too hard to work around.



How about we pass a timeout value to StreamingLLogNextMessage (..) where 
it returns if no data is available after the timeout to give the caller 
a chance to do something else.



This is basically the Slony 2.2 sl_log format minus a few columns we no
longer need (txid, actionseq).
command_args is a postgresql text array of column=value pairs.  Ie [
{id=1},{name='steve'},{project='slony'}]

It seems to me that that makes escaping unneccesarily complicated, but
given you already have all the code... ;)


When I look at the actual code/representation we picked it is closer to 
{column1,value1,column2,value2...}





I don't t think our output plugin will be much more complicated than the
test_decoding plugin.

Good. Thats the idea ;). Are you ok with the interface as it is now or
would you like to change something?


I'm going to think about this some more and maybe try to write an 
example plugin before I can say anything with confidence.




Yes. We will also need something like that. If you remember the first
prototype we sent to the list, it included the concept of an
'origin_node' in wal record. I think you actually reviewed that one ;)

That was exactly aimed at something like this...

Since then my thoughts about how the origin_id looks like have changed a
bit:
- origin id is internally still represented as an uint32/Oid
   - never visible outside of wal/system catalogs
- externally visible it gets
   - assigned an uuid
   - optionally assigned a user defined name
- user settable (permissions?) origin when executing sql:
   - SET change_origin_uuid = 'uuid';
   - SET change_origin_name = 'user-settable-name';
   - defaults to the local node
- decoding callbacks get passed the origin of a change
   - txn-{origin_uuid, origin_name, origin_internal?}
- the init decoding callback can setup an array of interesting origins,
   so the others don't even get the ReorderBuffer treatment

I have to thank the discussion on -hackers and a march through prague
with Marko here...
So would the uuid and optional name assignment be done in the output 
plugin or some else?
When/how does the uuid get generated and where do we store it so the 
same uuid gets returned when postgres restarts.  Slony today stores all 
this type of stuff in user-level tables and user-level functions 
(because it has no other choice).What is the connection between 
these values and the 'slot-id' in your proposal for the init arguments? 
Does the slot-id need to be the external uuid of the other end or is 
there no direct connection?


Today slony allows us to replicate between two databases in the same 
postgresql cluster (I use this for testing all the time)
Slony also allows for two different 'slony clusters' to be setup in the 
same database (or so I'm told, I don't think I have ever tried this myself).


plugin functions that let me query the local database and then return 
the  uuid and origin_name would work in this model.


+1 on being able to mark the 'change origin' in a SET command when the 
replication process is pushing data into the replica.



Exactly how we do this filtering is an open question,  I think the output
plugin will at a minimum need to know:

a) What the slony node id is of the node it is running on.  This is easy to
figure out if the output plugin is able/allowed to query its database.  Will
this be possible? I would expect to be able to query the database as it
exists now(at plugin invocation time) not as it existed in the past when the
WAL was generated.   In addition to the node ID I can see us wanting to be
able to query other slony tables (sl_table,sl_set etc...)

Hm. There is no fundamental reason not to allow normal database access
to the current database but it won't be all that cheap, so doing it
frequently is not a good idea.
The reason its not cheap is that you basically need to teardown the
postgres internal caches if you switch the timestream in which you are
working.

Would go something like:

TransactionContext =

[HACKERS] removing some dead keep compiler quiet code

2012-11-19 Thread Peter Eisentraut

Since ereport() can now communicate to the compiler whether it returns
or not, a fair amount of keep compiler quiet code is dead.  Since the
method that ereport() uses is not dependent on any compiler-specific
attributes, I think this code can just be removed.  I propose the
attached patch.
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index ceec6ff..b4fd50c 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2438,11 +2438,6 @@ static bool is_valid_dblink_option(const PQconninfoOption *options,
 
 		return NULL;
 	}
-
-	/*
-	 * never reached, but keep compiler quiet
-	 */
-	return NULL;
 }
 
 /*
diff --git a/contrib/spi/moddatetime.c b/contrib/spi/moddatetime.c
index 2ec96540..1d10b7a 100644
--- a/contrib/spi/moddatetime.c
+++ b/contrib/spi/moddatetime.c
@@ -110,13 +110,10 @@
 	ObjectIdGetDatum(InvalidOid),
 	Int32GetDatum(-1));
 	else
-	{
 		ereport(ERROR,
 (errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION),
  errmsg(attribute \%s\ of \%s\ must be type TIMESTAMP or TIMESTAMPTZ,
 		args[0], relname)));
-		newdt = (Datum) 0;		/* keep compiler quiet */
-	}
 
 /* 1 is the number of items in the arrays attnum and newdt.
 	attnum is the positional number of the field to be updated.
diff --git a/contrib/tsearch2/tsearch2.c b/contrib/tsearch2/tsearch2.c
index 968bd80..ee7a05b 100644
--- a/contrib/tsearch2/tsearch2.c
+++ b/contrib/tsearch2/tsearch2.c
@@ -54,8 +54,6 @@
  errmsg(function %s is no longer supported, \
 		format_procedure(fcinfo-flinfo-fn_oid)), \
  errhint(Switch to new tsearch functionality.))); \
-		/* keep compiler quiet */		\
-		PG_RETURN_NULL();\
 	}	\
 	PG_FUNCTION_INFO_V1(name)
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1faf666..60bf08b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5365,7 +5365,6 @@ static bool read_backup_label(XLogRecPtr *checkPointLoc,
 			ereport(FATAL,
 	(errmsg(could not locate required checkpoint record),
 	 errhint(If you are not restoring from a backup, try removing the file \%s/backup_label\., DataDir)));
-			wasShutdown = false;	/* keep compiler quiet */
 		}
 		/* set flag to delete it later */
 		haveBackupLabel = true;
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 79b71b3..255b928 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -76,7 +76,6 @@
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 			 errmsg(invalid fork name),
 			 errhint(Valid fork names are \main\, \fsm\, and \vm\.)));
-	return InvalidForkNumber;	/* keep compiler quiet */
 }
 
 /*
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index b9cfee2..4084c17 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -772,7 +772,6 @@ static bool stack_address_present_add_flags(const ObjectAddress *object,
 		(errcode(ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST),
 		 errmsg(cannot drop %s because it is required by the database system,
 getObjectDescription(object;
-subflags = 0;	/* keep compiler quiet */
 break;
 			default:
 elog(ERROR, unrecognized dependency type '%c' for %s,
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 8ac8373..2cf01d5 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -75,13 +75,10 @@
 	else if (TRIGGER_FIRED_BY_UPDATE(trigdata-tg_event))
 		new_row = trigdata-tg_newtuple;
 	else
-	{
 		ereport(ERROR,
 (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
  errmsg(function \%s\ must be fired for INSERT or UPDATE,
 		funcname)));
-		new_row = NULL;			/* keep compiler quiet */
-	}
 
 	/*
 	 * If the new_row is now dead (ie, inserted and then deleted within our
diff --git a/src/backend/commands/define.c b/src/backend/commands/define.c
index e0b0fc3..87051ac 100644
--- a/src/backend/commands/define.c
+++ b/src/backend/commands/define.c
@@ -106,7 +106,6 @@
 	 errmsg(%s requires a numeric value,
 			def-defname)));
 	}
-	return 0;	/* keep compiler quiet */
 }
 
 /*
@@ -161,7 +160,6 @@
 			(errcode(ERRCODE_SYNTAX_ERROR),
 			 errmsg(%s requires a Boolean value,
 	def-defname)));
-	return false;/* keep compiler quiet */
 }
 
 /*
@@ -194,7 +192,6 @@
 	 errmsg(%s requires a numeric value,
 			def-defname)));
 	}
-	return 0;	/* keep compiler quiet */
 }
 
 /*
@@ -223,7 +220,6 @@
 	 errmsg(argument of %s must be a name,
 			def-defname)));
 	}
-	return NIL;	/* keep compiler quiet */
 }
 
 /*
@@ -253,7 +249,6 @@
 	 errmsg(argument of %s must be a type name,
 			def-defname)));
 	}
-	return NULL;/* keep compiler quiet */
 }
 
 /*
@@ -298,7 +293,6 @@
 			(errcode(ERRCODE_SYNTAX_ERROR),
 			 errmsg(invalid argument for %s: \%s\,
 	def-defname, defGetString(def;
-	return 0;	/* keep compiler quiet */

Re: [HACKERS] removing some dead keep compiler quiet code

2012-11-19 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 Since ereport() can now communicate to the compiler whether it returns
 or not, a fair amount of keep compiler quiet code is dead.  Since the
 method that ereport() uses is not dependent on any compiler-specific
 attributes, I think this code can just be removed.  I propose the
 attached patch.

Meh.  This will only work if the compiler understands that abort()
doesn't return --- compilers that don't know that will start bleating
about uninitialized variables again.  And the ones that do know it
will be able to throw away the dead code anyway.

I doubt this is a good idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] too much pgbench init output

2012-11-19 Thread Jeevan Chalke

Hi,


On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 19.11.2012 11:59, Jeevan Chalke wrote:
  Hi,
 
  I gone through the discussion for this patch and here is my review:
 
  The main aim of this patch is to reduce the number of log lines. It is
  also suggested to use an options to provide the interval but few of us
  are not much agree on it. So final discussion ended at keeping 5 sec
  interval between each log line.
 
  However, I see, there are two types of users here:
  1. Who likes these log lines, so that they can troubleshoot some
  slowness and all
  2. Who do not like these log lines.

 Who likes these lines / needs them for something useful?


No idea. I fall in second category.

But from the discussion, I believe some people may need detailed (or lot
more) output.



  So keeping these in mind, I rather go for an option which will control
  this. People falling in category one can set this option to very low
  where as users falling under second category can keep it high.

 So what option(s) would you expect? Something that tunes the interval
 length or something else?


Interval length.



 A switch that'd choose between the old and new behavior might be a good
 idea, but I'd strongly vote against automagic heuristics. It makes the
 behavior very difficult to predict and I really don't want to force the
 users to wonder whether the long delay is due to general slowness of the
 machine or some clever logic that causes long delays between log
 messages.

 That's why I choose a very simple approach with constant time interval.
 It does what I was aiming for (less logs) and it's easy to predict.
 Sure, we could choose different interval (or make it an option).


I am preferring an option for choosing an interval, say from 1 second to 10
seconds.

BTW, what if, we put one log message every 10% (or 5%) with time taken
(time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
This will have only 10 (or 20) lines per pgbench initialisation.
And since we are showing time taken for each block, if any slowness
happens, one can easily find a block by looking at the timings and
troubleshoot it.
Though 10% or 5% is again a debatable number, but keeping it constant will
eliminate the requirement of an option.




  However, assuming we settled on 5 sec delay, here are few comments on
  that patch attached:
 
  Comments:
  =
 
  Patch gets applied cleanly with some whitespace errors. make and make
  install too went smooth.
  make check was smooth. Rather it should be smooth since there are NO
  changes in other part of the code rather than just pgbench.c and we do
  not have any test-case as well.
 
  However, here are few comments on changes in pgbench.c
 
  1.
  Since the final discussion ended at keeping a 5 seconds interval will be
  good enough, Author used a global int variable for that.
  Given that it's just a constant, #define would be a better choice.

 Good point. Although if we add an option to supply different values, a
 variable is a better match.


Exactly, if we ended up with an option then it looks good. But in your
current patch it was constant, so #define should be preferred.



  2.
  +/* let's not call the timing for each row, but only each 100
  rows */
  Why only 100 rows ? Have you done any testing to come up with number 100
  ? To me it seems very low. It will be good to test with 1K or even 10K.
  On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in
  VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
  checking every 100 rows looks overkill.

 Well, the 100 is clearly a magical constant. The goal was to choose a
 value large enough to amortize the getlocaltime() cost, but small enough
 to print info even in cases when the performance sucks for some reason.

 I've seen issues where the speed suddenly dropped to a fraction of the
 expected value, e.g. 100 rows/second, and in those cases you'd have to
 wait for a very long time to actually get the next log message (with the
 suggested 10k step).

 So 100 seems like a good compromise to me ...


Hmm... inserting just 100 rows / seconds is really slow.
Since you already seen such behaviour, I have no objection keeping it 100.



  3.
  Please indent following block as per the indentation just above that
 
  /* used to track elapsed time and estimate of the remaining time */
  instr_timestart, diff;
  double elapsed_sec, remaining_sec;
  int log_interval = 1;

 OK

  4.
  +/* have ve reached the next interval? */
  Do you mean have WE reached...

 OK

 
  5.
  While applying a patch, I got few white-space errors. But I think every
  patch goes through pgindent which might take care of this.

 OK

 
  Thanks

 Thanks for the review. I'll wait for a bit more discussion about the
 choices before submitting another version of the patch.


Sure



 Tomas


 --
 Sent via pgsql-hackers mailing list

[HACKERS] cannot disable hashSetOp

2012-11-19 Thread Pavel Stehule

hello

I looked on issue with consuming memory by query

HashSetOp Except  (cost=0.00..297100.69 rows=594044 width=30)
  -  Append  (cost=0.00..234950.32 rows=8286716 width=30)
-  Subquery Scan on *SELECT* 1  (cost=0.00..168074.62
rows=5940431 width=29)
  -  Seq Scan on ac  (cost=0.00..108670.31 rows=5940431 width=29)
-  Subquery Scan on *SELECT* 2  (cost=0.00..66875.70
rows=2346285 width=32)
  -  Seq Scan on artist_credit  (cost=0.00..43412.85
rows=2346285 width=32)


I was surprised, because I didn't find any way how to disable hashing.

Is there some way?

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the number of pending entries in GIN index with FASTUPDATE=on

2012-11-19 Thread Kyotaro HORIGUCHI

Hello, 

 Agreed. Attached patch introduces the pgstatginindex() which now reports
 GIN version number, number of pages in the pending list and number of
 tuples in the pending list, as information about a GIN index.

It seems fine on the whole, and I have some suggestions.

1. This patch applies current git head cleanly, but installation
  ends up with failure because of the lack of
  pgstattuple--1.0--1.1.sql which added in Makefile.

2. I feel somewhat uneasy with size for palloc's (it's too long),
  and BuildTupleFromCString used instead of heap_from_tuple.. But
  it would lead additional modification for existent simillars.

  You can leave that if you prefer to keep this patch smaller,
  but it looks to me more preferable to construct the result
  tuple not via c-strings in some aspects. (*1)

3. pgstatginidex shows only version, pending_pages, and
  pendingtuples. Why don't you show the other properties such as
  entry pages, data pages, entries, and total pages as
  pgstatindex does?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


(*1) Sample

diff --git a/contrib/pgstattuple/pgstatindex.c 
b/contrib/pgstattuple/pgstatindex.c
index 8a2ae85..71c2023 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -29,2 +29,3 @@
 
+#include access/htup_details.h
 #include access/gin_private.h
@@ -39,3 +40,2 @@
 
-
 extern Datum pgstatindex(PG_FUNCTION_ARGS);
@@ -330,4 +330,5 @@ pgstatginindex(PG_FUNCTION_ARGS)
int j;
-   char   *values[3];
+   Datum   values[3];
Datum   result;
+   bool nulls[3] = {false, false, false};
 
@@ -376,11 +377,6 @@ pgstatginindex(PG_FUNCTION_ARGS)
j = 0;
-   values[j] = palloc(32);
-   snprintf(values[j++], 32, %d, stats.version);
-   values[j] = palloc(32);
-   snprintf(values[j++], 32, %u, stats.pending_pages);
-   values[j] = palloc(64);
-   snprintf(values[j++], 64, INT64_FORMAT, stats.pending_tuples);
-
-   tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
-  values);
+   values[j++] = Int32GetDatum(stats.version);
+   values[j++] = UInt32GetDatum(stats.pending_pages);
+   values[j++] = Int64GetDatumFast(stats.pending_tuples);
+   tuple = heap_form_tuple(tupleDesc, values, nulls);



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

75 matches

Mail list logo