Re: [HACKERS] Online base backup from the hot-standby

2011-10-17 Thread Fujii Masao
2011/10/15 Jun Ishiduka :
>
>> >     if (!shutdown && XLogStandbyInfoActive())
>> > +   {
>> >             LogStandbySnapshot(&checkPoint.oldestActiveXid, 
>> > &checkPoint.nextXid);
>> > +           XLogReportParameters(REPORT_ON_BACKEND);
>> > +   }
>> >
>> > Why doesn't the change of FPW need to be WAL-logged when
>> > shutdown checkpoint is performed? It's helpful to add the comment
>> > explaining why.
>>
>> Sure. I update the patch soon.
>
> Done.

+   /*
+* The backend writes WAL of FPW at checkpoint. However, The 
backend do
+* not need to write WAL of FPW at checkpoint shutdown because 
it
+* performs when startup finishes.
+*/
+   XLogReportParameters(REPORT_ON_BACKEND);

I'm still unclear why that WAL doesn't need to be written at shutdown
checkpoint.
Anyway, the first sentence in the above comments is not right. Not a backend but
a bgwriter writes that WAL at checkpoint.

The second also seems not to be right. It implies that a shutdown checkpoint is
performed only at end of startup. But it may be done when smart or fast shutdown
is requested.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-10-17 Thread Jeff Davis
On Sun, 2011-10-16 at 14:43 -0700, Jeff Davis wrote:
> On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote:
> 
> > The first thing caught my eye in existing GiST code is idea of
> > subtype_float. float8 has limited precision and can't respresent, for
> > example, varlena values good enough. Even if we have large int8 value
> > we can loose lower bits, but data distribution can be so that these
> > bits are valuable. Wouldn't it better to have function like
> > subtype_diff_float which returns difference between two values of
> > subtype as an float? Using of such function could make penalty more
> > sensible to even small difference between values, and accordingly more
> > relevant.
> > 
> I started implementing subtype_diff, and I noticed that it requires
> defining an extra function for each range type. Previously, the numeric
> types could just use a cast, which was convenient for user-defined range
> types.
> 
> If you have any other ideas to make that cleaner, please let me know.
> Otherwise I'll just finish implementing subtype_diff.

I'm beginning to think that we should just allow the user to specify
their own gist_penalty function. Specifying just the subtype_diff
doesn't save much time, and it can only be limiting. Additionally, it's
harder for users to understand the purpose of the function.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Patch for cursor calling with named parameters

2011-10-17 Thread Yeb Havinga

On 2011-10-15 07:41, Tom Lane wrote:

Yeb Havinga  writes:

Hello Royce,
Thanks again for testing.

I looked this patch over but concluded that it's not ready to apply,
mainly because there are too many weird behaviors around error
reporting.


Tom, thanks for reviewing - getting the syntax errors to be at the exact 
location was indeed something that I thought would be near impossible, 
however the whitespace suggestion together with the others you made seem 
like a good path to go forward. Thanks for taking the time to write your 
comments, it will be a great help with making an improved version.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: set GUC variables for single query

2011-10-17 Thread Jan Urbański
On 17/10/11 02:53, Robert Haas wrote:
> On Sun, Oct 16, 2011 at 4:58 PM, Tom Lane  wrote:
>> Dimitri Fontaine  writes:
>>> Now that you mention it, the following might actually already work:
>>
>>>  WITH settings AS (
>>>SELECT set_config('timezone', 'Europe/Amsterdam', t),
>>>   set_config('work_mem', '1 GB', t)
>>>  ),
>>>   foo AS (
>>>SELECT …
>>>  )
>>>  INSERT INTO bar SELECT * FROM foo;
>>
>> Only for small values of "work" ... you won't be able to affect planner
>> settings that way, nor can you assume that that WITH item is executed
>> before all else.  See recent thread pointing out that setting values
>> mid-query is unsafe.
> 
> I previously floated the idea of using a new keyword, possibly LET,
> for this, like this:
> 
> LET var = value [, ...] IN query

LET was something I thought about, although you'd have to use something
like parenthesis around the GUC assignements because "value" can contain
commas, leading to shift/reduce conflicts (that sucks, unfortunately).

But before whipping out the paint bucket I wanted to see if there's
enough buy-in to justify rehashing the syntax details.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-17 Thread Florian Pflug
On Oct17, 2011, at 01:09 , Tom Lane wrote:
> Florian Pflug  writes:
>> ... reading those parts again, I realize the it says "When ORDER BY is 
>> omitted
>> the *default* frame consists ... ", and that the second quote is followed
>> by a footnote which says
> 
>>  There are options to define the window frame in other ways, but this 
>> tutorial
>>  does not cover them. See Section 4.2.8 for details. [3.5, Window Functions]
> 
>> So it was just me being thick. Sorry for the noise.
> 
> Hmm.  Maybe the use of a  there is too subtle, and we should
> instead have that text in-line (probably in parentheses)?  Or we could
> use a , but that's probably too much emphasis.

Inline and in parentheses sounds fine.

In addition, I think we should reword the explanation in 4.2.8 (The SQL Language
/ SQL Syntax / Value Expressions / Window Functions). Instead of that rather
long (and IMHO hard to read) paragraph about possible frame clauses and their
behaviour in the presence or absence of an ORDER BY clause, we should go with
a more algorithmic explanation I think.

Something along these lines maybe:

--
.) PARTITION BY splits the rows into disjoint partitions. All further processing
   happens only inside a single partition

.) In RANGE mode, ORDER BY then splits each partition into an ordered list of
   sub-partitions, each containing rows which the ORDER BY considers to be
   equivalent.

.) In ROWS mode, OTOH, each sub-partition contains only a single row. Thus, if
   there are rows which are considered to be equivalent by the ORDER BY, the
   ordering of the sub-partition isn't fully determined.

.) Each row's frame then consists of some consecutive range of sub-partitions.

.) In RANGE mode, that consecutive range can only start at either the first
   sub-partition or the current row's sub-partition, and can only end at either
   the current row's sub-partition or the last sub-partitions.

.) In ROWS mode, the consecutive range may additional start  sub-partitions
   (or rows, it's the same thing here) before the current row, and may 
additionally
   end  sub-partitions/rows after the current row.

>From that, it follows that even with an underspecified sort order, the 
>contents of
each frame are still fully determined in RANGE mode. The ordering of rows within
a frame is not determined, though. So overall, in RANGE mode, a query's result 
is
only non-deterministic if the window function is sensitive to the ordering of 
rows
within a frame.

In ROWS mode, OTOH, the contents each frame themselves are not fully determined,
so even an ordering agnostic window function may produce non-deterministic 
results.
--

If you think that something along these lines would be an improvement, I can try
to come up with a patch.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] termination of backend waiting for sync rep generates a junk log message

2011-10-17 Thread Fujii Masao
Hi,

In 9.1, the following procedure writes a junk message into the server log;

$ initdb -D data
$ cat << EOF >> data/postgresql.conf
wal_level = hot_standby
max_wal_senders = 1
synchronous_standby_names = '*'
EOF
$ pg_ctl -D data start
$ psql -c "create table t (i int)" &
$ pg_ctl -D data -m fast stop

WARNING:  canceling the wait for synchronous replication and
terminating connection due to administrator command
DETAIL:  The transaction has already committed locally, but might not
have been replicated to the standby.
backend> FATAL:  terminating connection due to administrator command

The above is the server log messages that I got when I did the procedure.
"backend> " is a junk. If a backend is terminated while it's waiting for
synchronous replication, whereToSendOutput is set to DestNone. Then,
whereToSendOutput=DestNone makes ReadCommand() call
InteractiveBackend() which outputs "backend> ".

This junk message might mess up the server log monitoring tools. I think
it should be removed.

The simple fix is to change InteractiveBackend() so that it calls
CHECK_FOR_INTERRUPTS() before it outputs "backend> ". Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pushing ScalarArrayOpExpr support into the btree index AM

2011-10-17 Thread Florian Pflug
On Oct16, 2011, at 20:26 , Tom Lane wrote:
> Florian Pflug  writes:
>> On Oct16, 2011, at 19:09 , Tom Lane wrote:
>>> That doesn't seem like the same thing at all, because the indexed column
>>> is on different sides of the expression in the two cases.  The situation
>>> that I'm worried about is "indexedcolumn op ANY(arrayconstant)", and
>>> what you're bringing up is "constant op ANY(indexedarraycolumn)".
> 
>> Couldn't we teach the main executor to push a ScalarArrayOpExpr down
>> into the index AM if the operator belongs to the index's opclass, one
>> side is indexed, and the other is constant?
> 
> Well, no, unless you're proposing to somehow implement the "constant op
> ANY(indexedarraycolumn)" case in all the AMs.  I don't see any practical
> way to do it in btree, for one.

Hm, true. Also, the "operator belongs to the index's opclass" part of the
push-down condition I proposed was wrong anyway, because the "=" operator
in e.g.

  3 = ANY(indexed_int_array_column)

isn't in the opclass int_array_ops. From that, it seems that what would be
needed to make the planner use a GIN index to evaluate the qual above is a
a way for it to realize that there's a connection between some GIN indices
(like *_array_ops) and btree opclasses on the GIN index's storage type. Which
would be nice, but I see now that it has little to do with your proposal,
which is only concerned with operators from the index's opclass.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: set GUC variables for single query

2011-10-17 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
> On 17/10/11 02:53, Robert Haas wrote:
>> I previously floated the idea of using a new keyword, possibly LET,
>> for this, like this:
>> 
>> LET var = value [, ...] IN query

> LET was something I thought about, although you'd have to use something
> like parenthesis around the GUC assignements because "value" can contain
> commas, leading to shift/reduce conflicts (that sucks, unfortunately).

Probably better to forbid commas --- people can always put such values
inside a quoted literal.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-17 Thread Erik Rijkers
On Mon, October 17, 2011 00:09, Jeff Davis wrote:

To facilitate would-be testers let me mention that to apply this patch one 
needs to remove the
catversion changes from the patch.

(as discussed before: 
http://archives.postgresql.org/pgsql-hackers/2011-02/msg00817.php )


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-17 Thread Tom Lane
Florian Pflug  writes:
> In addition, I think we should reword the explanation in 4.2.8 (The SQL 
> Language
> / SQL Syntax / Value Expressions / Window Functions). Instead of that rather
> long (and IMHO hard to read) paragraph about possible frame clauses and their
> behaviour in the presence or absence of an ORDER BY clause, we should go with
> a more algorithmic explanation I think.

Hm.  I do not think what you're proposing is a substitute for the text
that's now in 4.2.8, because that's mostly providing syntactic details.
It might make a good addition though.

> If you think that something along these lines would be an improvement, I can 
> try
> to come up with a patch.

Sure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Collecting statistics on CSV file data

2011-10-17 Thread Shigeru Hanada
(2011/10/07 18:09), Etsuro Fujita wrote:
> Thank you for the review and the helpful information.
> I rebased. Please find attached a patch. I'll add the patch to the next CF.
> 
> Changes:
> 
>* cleanups and fixes
>* addition of the following to ALTER FOREIGN TABLE
>ALTER [COLUMN] column SET STATISTICS integer
>ALTER [COLUMN] column SET ( n_distinct = val ) (n_distinct only)
>ALTER [COLUMN] column RESET ( n_distinct )
>* reflection of the force_not_null info in acquiring sample rows
>* documentation

The new patch could be applied with some shifts.  Regression tests of
core and file_fdw have passed cleanly.  Though I've tested only simple
tests, ANALYZE works for foreign tables for file_fdw, and estimation of
costs and selectivity seem appropriate.

New API AnalyzeForeignTable
===
In your design, new handler function is called instead of
do_analylze_rel.  IMO this hook point would be good for FDWs which can
provide statistics in optimized way.  For instance, pgsql_fdw can simply
copy statistics from remote PostgreSQL server if they are compatible.
Possible another idea is to replace acquire_sample_rows so that other
FDWs can reuse most part of fileAnalyzeForeignTable and
file_fdw_do_analyze_rel.

And I think that AnalyzeForeignTable should be optional, and it would be
very useful if a default handler is provided.  Probably a default
handler can use basic FDW APIs to acquire sample rows from the result of
"SELECT * FROM foreign_table" with skipping periodically.  It won't be
efficient but I think it's not so unreasonable.

Other issues

There are some other comments about non-critical issues.
- When there is no analyzable column, vac_update_relstats is not called.
 Is this behavior intentional?
- psql can't complete foreign table name after ANALYZE.
- A new parameter has been added to vac_update_relstats in a recent
commit.  Perhaps 0 is OK for that parameter.
- ANALYZE without relation name ignores foreign tables because
get_rel_oids doesn't list foreign tables.
- IMO logging "analyzing foo.bar" should not be done in
AnalyzeForeignTable handler of each FDW because some FDW might forget to
do it.  Maybe it should be pulled up to analyze_rel or somewhere in core.
- It should be mentioned in a document that foreign tables are not
analyzed automatically because they are read-only.

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts

2011-10-17 Thread Alvaro Herrera
I just noticed that HeapTupleHeaderAdvanceLatestRemovedXid is comparing Xmax as 
a TransactionId without verifying whether it is a multixact or not.  Since they 
advance separately, this could lead to bogus answers.  This probably needs to 
be fixed.  I didn't look into past releases to see if there's a live released 
bug here or not.

I think the fix is simply to ignore the Xmax if the HEAP_XMAX_IS_MULTI bit is 
set.

Additionally I think it should check HEAP_XMAX_INVALID before reading the Xmax 
at all.

-- 
Álvaro Herrera (from some crappy webmail)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts

2011-10-17 Thread Tom Lane
Alvaro Herrera  writes:
> I just noticed that HeapTupleHeaderAdvanceLatestRemovedXid is comparing Xmax 
> as a TransactionId without verifying whether it is a multixact or not.  Since 
> they advance separately, this could lead to bogus answers.  This probably 
> needs to be fixed.  I didn't look into past releases to see if there's a live 
> released bug here or not.

> I think the fix is simply to ignore the Xmax if the HEAP_XMAX_IS_MULTI bit is 
> set.

> Additionally I think it should check HEAP_XMAX_INVALID before reading the 
> Xmax at all.

If it's failing to even check XMAX_INVALID, surely it's completely
broken?  Perhaps it assumes its caller has checked all this?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts

2011-10-17 Thread Simon Riggs
On Mon, Oct 17, 2011 at 8:03 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> I just noticed that HeapTupleHeaderAdvanceLatestRemovedXid is comparing Xmax 
>> as a TransactionId without verifying whether it is a multixact or not.  
>> Since they advance separately, this could lead to bogus answers.  This 
>> probably needs to be fixed.  I didn't look into past releases to see if 
>> there's a live released bug here or not.
>
>> I think the fix is simply to ignore the Xmax if the HEAP_XMAX_IS_MULTI bit 
>> is set.
>
>> Additionally I think it should check HEAP_XMAX_INVALID before reading the 
>> Xmax at all.
>
> If it's failing to even check XMAX_INVALID, surely it's completely
> broken?  Perhaps it assumes its caller has checked all this?

Looking at it now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts

2011-10-17 Thread Simon Riggs
On Mon, Oct 17, 2011 at 8:03 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> I just noticed that HeapTupleHeaderAdvanceLatestRemovedXid is comparing Xmax 
>> as a TransactionId without verifying whether it is a multixact or not.  
>> Since they advance separately, this could lead to bogus answers.  This 
>> probably needs to be fixed.  I didn't look into past releases to see if 
>> there's a live released bug here or not.
>
>> I think the fix is simply to ignore the Xmax if the HEAP_XMAX_IS_MULTI bit 
>> is set.
>
>> Additionally I think it should check HEAP_XMAX_INVALID before reading the 
>> Xmax at all.
>
> If it's failing to even check XMAX_INVALID, surely it's completely
> broken?  Perhaps it assumes its caller has checked all this?

HeapTupleHeaderAdvanceLatestRemovedXid() is only ever called when
HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD, which only happens
when HEAP_XMAX_IS_MULTI is not set.

I'll add an assert to check this and a comment to explain.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-17 Thread Chris Redekop
I can confirm that both the pg_clog and pg_subtrans errors do occur when
using pg_basebackup instead of rsync.  The data itself seems to be fine
because using the exact same data I can start up a warm standby no problem,
it is just the hot standby that will not start up.


On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop  wrote:

> > > Linas, could you capture the output of pg_controldata *and* increase
> the
> > > log level to DEBUG1 on the standby? We should then see nextXid value of
> > > the checkpoint the recovery is starting from.
> >
> > I'll try to do that whenever I'm in that territory again... Incidentally,
> > recently there was a lot of unrelated-to-this-post work to polish things
> up
> > for a talk being given at PGWest 2011 Today :)
> >
> > > I also checked what rsync does when a file vanishes after rsync
> computed the
> > > file list, but before it is sent. rsync 3.0.7 on OSX, at least,
> complains
> > > loudly, and doesn't sync the file. It BTW also exits non-zero, with a
> special
> > > exit code for precisely that failure case.
> >
> > To be precise, my script has logic to accept the exit code 24, just as
> > stated in PG manual:
> >
> > Docs> For example, some versions of rsync return a separate exit code for
> > Docs> "vanished source files", and you can write a driver script to
> accept
> > Docs> this exit code as a non-error case.
>
> I also am running into this issue and can reproduce it very reliably.  For
> me, however, it happens even when doing the "fast backup" like so:
> pg_start_backup('whatever', true)...my traffic is more write-heavy than
> linas's tho, so that might have something to do with it.  Yesterday it
> reliably errored out on pg_clog every time, but today it is
> failing sporadically on pg_subtrans (which seems to be past where the
> pg_clog error was)the only thing that has changed is that I've changed
> the log level to debug1I wouldn't think that could be related though.
>  I've linked the requested pg_controldata and debug1 logs for both errors.
>  Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
> pg_controldata, and then the postgres debug1 log produced from a subsequent
> startup attempt.
>
> pg_clog: http://pastebin.com/mTfdcjwH
> pg_subtrans: http://pastebin.com/qAXEHAQt
>
> Any workarounds would be very appreciated.would copying clog+subtrans
> before or after the rest of the data directory (or something like that) make
> any difference?
>
> Thanks!
>


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-17 Thread Chris Redekop
Well, on the other hand maybe there is something wrong with the data.
 Here's the test/steps I just did -
1. I do the pg_basebackup when the master is under load, hot slave now will
not start up but warm slave will.
2. I start a warm slave and let it catch up to current
3. On the slave I change 'hot_standby=on' and do a 'service postgresql
restart'
4. The postgres fails to restart with the same error.
5. I turn hot_standby back off and postgres starts back up fine as a warm
slave
6. I then turn off the load, the slave is all caught up, master and slave
are both sitting idle
7. I, again, change 'hot_standby=on' and do a service restart
8. Again it fails, with the same error, even though there is no longer any
load.
9. I repeat this warmstart/hotstart cycle a couple more times until to my
surprise, instead of failing, it successfully starts up as a hot standby
(this is after maybe 5 minutes or so of sitting idle)

So...given that it continued to fail even after the load had been turned of,
that makes me believe that the data which was copied over was invalid in
some way.  And when a checkpoint/logrotation/somethingelse occurred when not
under load it cleared itself upI'm shooting in the dark here

Anyone have any suggestions/ideas/things to try?


On Mon, Oct 17, 2011 at 2:13 PM, Chris Redekop  wrote:

> I can confirm that both the pg_clog and pg_subtrans errors do occur when
> using pg_basebackup instead of rsync.  The data itself seems to be fine
> because using the exact same data I can start up a warm standby no problem,
> it is just the hot standby that will not start up.
>
>
> On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop  wrote:
>
>> > > Linas, could you capture the output of pg_controldata *and* increase
>> the
>> > > log level to DEBUG1 on the standby? We should then see nextXid value
>> of
>> > > the checkpoint the recovery is starting from.
>> >
>> > I'll try to do that whenever I'm in that territory again...
>> Incidentally,
>> > recently there was a lot of unrelated-to-this-post work to polish things
>> up
>> > for a talk being given at PGWest 2011 Today :)
>> >
>> > > I also checked what rsync does when a file vanishes after rsync
>> computed the
>> > > file list, but before it is sent. rsync 3.0.7 on OSX, at least,
>> complains
>> > > loudly, and doesn't sync the file. It BTW also exits non-zero, with a
>> special
>> > > exit code for precisely that failure case.
>> >
>> > To be precise, my script has logic to accept the exit code 24, just as
>> > stated in PG manual:
>> >
>> > Docs> For example, some versions of rsync return a separate exit code
>> for
>> > Docs> "vanished source files", and you can write a driver script to
>> accept
>> > Docs> this exit code as a non-error case.
>>
>> I also am running into this issue and can reproduce it very reliably.  For
>> me, however, it happens even when doing the "fast backup" like so:
>> pg_start_backup('whatever', true)...my traffic is more write-heavy than
>> linas's tho, so that might have something to do with it.  Yesterday it
>> reliably errored out on pg_clog every time, but today it is
>> failing sporadically on pg_subtrans (which seems to be past where the
>> pg_clog error was)the only thing that has changed is that I've changed
>> the log level to debug1I wouldn't think that could be related though.
>>  I've linked the requested pg_controldata and debug1 logs for both errors.
>>  Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
>> pg_controldata, and then the postgres debug1 log produced from a subsequent
>> startup attempt.
>>
>> pg_clog: http://pastebin.com/mTfdcjwH
>> pg_subtrans: http://pastebin.com/qAXEHAQt
>>
>> Any workarounds would be very appreciated.would copying clog+subtrans
>> before or after the rest of the data directory (or something like that) make
>> any difference?
>>
>> Thanks!
>>
>
>


[HACKERS] BUG or strange behaviour of update on primary key

2011-10-17 Thread desmodemone
Hello there,
 two guys of our developer team ( Lorenzo and Federico )
have seen a strange behaviour  (in 8.4 and 9.1.1 ) on update, and I think is
a bug or something really strange or I not understand correctly this
behavior .

I explain now  ( begin transaction or auto commit is the same):

create table testup ( a int ) ;

alter table testup add primary key (a ) ;

insert into testup values (1);

insert into testup values (2);

 update  testup set a=a+1 ;
ERROR:  duplicate key value violates unique constraint "testup_pkey"
DETTAGLI: Key (a)=(2) already exists.


by  the way :

test=# update  testup set a=a-1 ;
UPDATE 2
SUCCESFUL

-- REVERSE ORDER --

Now create the same table with rows in  reverse physical order:

 create table testup2  ( a int ) ;

 alter table testup2  add primary key (a ) ;

insert into testup2  values (2) ;

 insert into testup2  values (1);

 update  testup2  set a=a+1 ;
UPDATE 2
 SUCCESFUL

by  the way :

test=# update  testup2  set a=a-1 ;
ERROR:  duplicate key value violates unique constraint "testup2_pkey"
DETTAGLI: Key (a)=(1) already exists.


I have tested in Oracle 11gR1 and 11gR2 without the same behaviour :

Oracle :


SQL> create table a ( b number ) ;

Tabella creata.

SQL> alter table a add primary key   (b) ;

Tabella modificata.

SQL> insert into a values (1 ) ;

Creata 1 riga.

SQL>  insert into a values (2) ;

Creata 1 riga.

SQL> commit ;

Commit completato.

SQL> update a set b=b+1 ;

Aggiornate 2 righe.

SQL> commit ;

Commit completato.

SQL> update a set b=b-1;

Aggiornate 2 righe.

SQL> commit;

Commit completato.

In MySQL 5.1.58 with InnoDB  the behaviour is more strange (always for +1
and indipendent from the reverse order O_o)  :


mysql> create table testup ( a int ) engine innodb ;
Query OK, 0 rows affected (0.21 sec)

mysql> alter table testup add primary key (a) ;
Query OK, 0 rows affected (0.34 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into testup values (1) ;
Query OK, 1 row affected (0.12 sec)

mysql> insert into testup values (2) ;
Query OK, 1 row affected (0.15 sec)

mysql> commit ;
Query OK, 0 rows affected (0.00 sec)

mysql> update  testup set a=a+1 ;
ERROR 1062 (23000): Duplicate entry '2' for key 'PRIMARY'  (-- like
postgres!)
mysql> update  testup set a=a-1 ;
Query OK, 2 rows affected (0.16 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql> commit ;
Query OK, 0 rows affected (0.00 sec)

mysql> update  testup set a=a+1 ;
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'
mysql> select * from testup ;
+---+
| a |
+---+
| 0 |
| 1 |
+---+
2 rows in set (0.00 sec)

-- REVERSE ORDER --

mysql> truncate table testup ;
Query OK, 0 rows affected (0.11 sec)

mysql>  insert into testup values (2) ;
Query OK, 1 row affected (0.12 sec)

mysql> insert into testup values (1) ;
Query OK, 1 row affected (0.17 sec)

mysql>  update  testup set a=a+1 ; (-- O_O  is tottaly different from
postgres!)
ERROR 1062 (23000): Duplicate entry '2' for key 'PRIMARY'
mysql> update  testup set a=a-1 ;
Query OK, 2 rows affected (0.16 sec)
Rows matched: 2  Changed: 2  Warnings: 0

In MySql with Myisam is tottaly different and similar to Oracle :

mysql> create table testup_myisam  ( a int ) engine myisam ;
Query OK, 0 rows affected (0.17 sec)

mysql> insert into testup_myisam values (2) ;
Query OK, 1 row affected (0.00 sec)

mysql>  insert into testup_myisam values (1) ;
Query OK, 1 row affected (0.00 sec)

mysql> update testup_myisam  set a=a+1 ;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql>  update testup_myisam  set a=a-1 ;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0

-- REVERSE ORDER --

mysql> truncate table testup_myisam ;
Query OK, 0 rows affected (0.00 sec)

mysql>  insert into testup_myisam values (1) ;
Query OK, 1 row affected (0.00 sec)

mysql>  insert into testup_myisam values (2) ;
Query OK, 1 row affected (0.00 sec)

mysql> update testup_myisam  set a=a+1 ;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql>  update testup_myisam  set a=a-1 ;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0


The problem for us  is when we develop or migrate  applications between
different databases.
By the way I think is not  right that an update on the same set of rows will
be successful or failed if the rows are ordered or not, no?
I  think it is something in correlation with visibility of rows in MVCC
(update=>insert + delete tuple).

What do you think about?

See you soon

Regards, Mat


Re: [HACKERS] BUG or strange behaviour of update on primary key

2011-10-17 Thread Tom Lane
desmodemone  writes:
> create table testup ( a int ) ;

> alter table testup add primary key (a ) ;

> insert into testup values (1);

> insert into testup values (2);

>  update  testup set a=a+1 ;
> ERROR:  duplicate key value violates unique constraint "testup_pkey"
> DETTAGLI: Key (a)=(2) already exists.

If you want that to work reliably, you need to mark the primary key
constraint as deferred.  By default, uniqueness is checked immediately
when a row is inserted or updated --- and here, when you update 1 to 2,
it's not unique because the second row hasn't been visited yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG or strange behaviour of update on primary key

2011-10-17 Thread desmodemone
Hello there
   Thanks Tom!
By the way I find something very funny :

Oracle 11gR2 :

SQL> create table testup ( a number ) ;

Tabella creata.

SQL> alter table testup add primary key (a) NOT DEFERRABLE INITIALLY
IMMEDIATE ;

Tabella modificata.

SQL> insert into testup values (1 ) ;

Creata 1 riga.

SQL>  insert into testup values (2 ) ;

Creata 1 riga.

SQL> commit ;

Commit completato.

SQL> update testup set a=a+1 ;

Aggiornate 2 righe.   -->>> Oracle Bug ??

SQL> commit ;

Commit completato.


Postgresql :


create table  testup  ( a int ) ;

alter table testup add primary key(a) NOT DEFERRABLE INITIALLY IMMEDIATE ;

 insert into testup values (1);

insert into testup values (2);

update testup set a=a+1 ;
ERROR:  duplicate key value violates unique constraint "testup_pkey"




Like Tom correctly says :

alter table  testup  DROP   CONSTRAINT testup_pkey ;

alter table testup add primary key(a)  DEFERRABLE INITIALLY IMMEDIATE ;

 update testup set a=a+1 ;
UPDATE 2

commit;


Seems an Oracle bug not Postgresql one!

Regards, Mat


2011/10/18 Tom Lane 

> desmodemone  writes:
> > create table testup ( a int ) ;
>
> > alter table testup add primary key (a ) ;
>
> > insert into testup values (1);
>
> > insert into testup values (2);
>
> >  update  testup set a=a+1 ;
> > ERROR:  duplicate key value violates unique constraint "testup_pkey"
> > DETTAGLI: Key (a)=(2) already exists.
>
> If you want that to work reliably, you need to mark the primary key
> constraint as deferred.  By default, uniqueness is checked immediately
> when a row is inserted or updated --- and here, when you update 1 to 2,
> it's not unique because the second row hasn't been visited yet.
>
>regards, tom lane
>


Re: [HACKERS] BUG or strange behaviour of update on primary key

2011-10-17 Thread Robert Haas
On Mon, Oct 17, 2011 at 7:30 PM, desmodemone  wrote:
> Seems an Oracle bug not Postgresql one!

I don't think it's a bug for it to work.  It'd probably work in
PostgreSQL too, if you inserted (2) first and then (1).  It's just
that, as Tom says, if you want it to be certain to work (rather than
depending on the order in which the rows are inserted), you need the
checks to be deferred.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts

2011-10-17 Thread Alvaro Herrera


At Monday, 10/17/2011 on 4:38 pm Simon Riggs wrote:
> On Mon, Oct 17, 2011 at 8:03 PM, Tom Lane  wrote:
> > Alvaro Herrera  writes:
> >> I just noticed that HeapTupleHeaderAdvanceLatestRemovedXid is comparing 
> >> Xmax as a TransactionId without verifying whether it is a multixact or 
> >> not.  Since they advance separately, this could lead to bogus answers.  
> >> This probably needs to be fixed.  I didn't look into past releases to see 
> >> if there's a live released bug here or not.
> >
> >> I think the fix is simply to ignore the Xmax if the HEAP_XMAX_IS_MULTI bit 
> >> is set.
> >
> >> Additionally I think it should check HEAP_XMAX_INVALID before reading the 
> >> Xmax at all.
> >
> > If it's failing to even check XMAX_INVALID, surely it's completely
> > broken?  Perhaps it assumes its caller has checked all this?
> 
> HeapTupleHeaderAdvanceLatestRemovedXid() is only ever called when
> HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD, which only happens
> when HEAP_XMAX_IS_MULTI is not set.

Hmkay.

> I'll add an assert to check this and a comment to explain.

This means I'll have to hack it up further in my FK locks patch.  No problem 
with that.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG or strange behaviour of update on primary key

2011-10-17 Thread Royce Ausburn

On 18/10/2011, at 1:00 PM, Robert Haas wrote:

> On Mon, Oct 17, 2011 at 7:30 PM, desmodemone  wrote:
>> Seems an Oracle bug not Postgresql one!
> 
> I don't think it's a bug for it to work.  It'd probably work in
> PostgreSQL too, if you inserted (2) first and then (1).  It's just
> that, as Tom says, if you want it to be certain to work (rather than
> depending on the order in which the rows are inserted), you need the
> checks to be deferred.

Do deferred checks such as this have a memory impact for bulk updates?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG or strange behaviour of update on primary key

2011-10-17 Thread Tom Lane
Royce Ausburn  writes:
> On 18/10/2011, at 1:00 PM, Robert Haas wrote:
>> I don't think it's a bug for it to work.  It'd probably work in
>> PostgreSQL too, if you inserted (2) first and then (1).  It's just
>> that, as Tom says, if you want it to be certain to work (rather than
>> depending on the order in which the rows are inserted), you need the
>> checks to be deferred.

> Do deferred checks such as this have a memory impact for bulk updates?

Yes indeed.  That's why immediate check is the default.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG or strange behaviour of update on primary key

2011-10-17 Thread Peter Eisentraut
On tis, 2011-10-18 at 01:30 +0200, desmodemone wrote:
> 
> alter table  testup  DROP   CONSTRAINT testup_pkey ;
> 
> alter table testup add primary key(a)  DEFERRABLE INITIALLY
> IMMEDIATE ;
> 
>  update testup set a=a+1 ;
> UPDATE 2
> 
> commit;
> 
> 
> Seems an Oracle bug not Postgresql one! 

Oracle's behavior is OK.  PostgreSQL's default behavior is wrong in the
sense that it checks the constraint even in invisible states *during*
the statement, rather than only after.  Marking the constraint
DEFERRABLE (which means deferrable to the end of the *transaction*, not
statement) is a red herring to get the system to do it right(er),
because there is no separate syntax to say deferrable to end of
statement.

Basically, this is maintaining historical buggy behavior for
performance.  If you want correct and slow behavior instead, you need to
tell explicitly.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-17 Thread Robert Haas
On Wed, Oct 12, 2011 at 11:45 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, the work is finished.
>
>  Version 4 of the patch is attached to this message.

I went through this in a bit more detail tonight and am cleaning it
up.  But I'm a bit confused, looking at pg_utf8_increment() in detail:

- Why does the second byte need special handling for 0xED and 0xF4?
AFAICT, UTF-8 requires all legal strings to have a second byte between
0x80 and 0xBF, just as in byte positions 3 and 4, so these bytes would
be invalid in this position anyway.
- In the first byte, we don't increment if the current value for that
byte is 0x7F, 0xDF, 0xEF, or 0xF4.  But why isn't it 0xF7 rather than
0xF4?  I see there's a comparable restriction in pg_utf8_islegal(),
but I don't understand why.
- Perhaps on the same point, the comments claim that we will fail for
code points U+0007F, U+007FF, U+0, and U+10.  But IIUC, a
4-byte unicode character can encode values up to U+1F, so why is
it U+10 rather than U+1F?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-17 Thread Tom Lane
Robert Haas  writes:
> - Why does the second byte need special handling for 0xED and 0xF4?

http://www.faqs.org/rfcs/rfc3629.html

See section 4 in particular.  The underlying requirement is to disallow
multiple representations of the same Unicode code point.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Object access hooks with arguments support (v1)

2011-10-17 Thread Robert Haas
On Thu, Oct 13, 2011 at 6:48 AM, Kohei KaiGai  wrote:
>    struct ObjectAccessInfoData {
>        ObjectAccessType   oa_type;
>        ObjectAddress         oa_address;
>        union {
>            struct {
>                HeapTuple       newtuple;
>                TupleDesc       tupdesc;  /* only if create a new relation */
>                    :
>            } post_create;      /* additional info for OAT_POST_CREATE event */
>        }
>    }

That's possibly an improvement over just passing everything opaquely,
but it still doesn't seem very good.  I mean, this is C, not Perl or
Python.  Anything where you pass a bunch of arguments of indeterminate
type and hope that the person you're calling can figure it out is
inherently pretty dangerous.  I had it in mind that the object access
hook mechanism could serve as a simple and generic way of letting an
external module know that an event of a certain type has occurred on a
certain object, and to let that module gain control.  But if we have
to pass a raft of arguments in, then it's not generic any more - it's
just a bunch of things that are probably really need to be separate
hooks shoved into a single function.

>> Moreover, if
>> you did document it, I think it would boil down to "this is what
>> sepgsql happens to need", and I don't think that's an acceptable
>> answer. ?We have repeatedly refused to adopt that approach in the
>> past.
>>
> Unfortunately, I still don't have a clear answer for this point.
> However, in general terms, it is impossible to design any interface without
> knowledge of its usage more or less.

Well, that's true.  But the hook also has to be maintainable.  ISTM
that there's no reasonable way for someone making a modification to
the code to know whether the additional local variable that they just
added to the function should be passed to the hook, or not.  Here, at
least as far as I can see, there's no guiding principle that would
enable future contributors to make an intelligent decision about that.
 And if someone wanted to write another client for the hook, it's not
very obvious whether the particular things you've chosen to pass here
would be relevant or not.  I think if you look over the code you'll
find that there's nowhere else that we have a hook that looks anything
like what you're proposing here.

> At least, this proposition enables modules being informed using
> commonly used data type (such as HeapTuple, TupleDesc), compared
> to the past approach that tried to push all the necessary information
> into argument list individually.

That does seem like a good direction to go in, but you're still
passing a lot of other stuff in there.  I guess my feeling is that if
we can't boil down the argument list to a short list of things that
are more-or-less common to all the call sites, we probably need to
rethink the whole design.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-17 Thread Robert Haas
On Mon, Oct 17, 2011 at 11:54 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> - Why does the second byte need special handling for 0xED and 0xF4?
>
> http://www.faqs.org/rfcs/rfc3629.html
>
> See section 4 in particular.  The underlying requirement is to disallow
> multiple representations of the same Unicode code point.

I'm still confused.  The input string is already known to be valid
UTF-8, so the second byte (if there is one) must be between 0x80 and
0xBF.  Therefore it will be neither 0xED nor 0xF4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-10-17 Thread Tatsuo Ishii
>> That would be great.  What I've been using as a test case is pgbench
>> -S -c $NUM_CPU_CORES -j $NUM_CPU_CORES with scale factor 100 and
>> shared_buffers=8GB.
>> 
>> I think what you'd want to compare is the performance of unpatched
>> master, vs. the performance with this line added to s_lock.h for your
>> architecture:
>> 
>> #define TAS_SPIN(lock)  (*(lock) ? 1 : TAS(lock))
>> 
>> We've now added that line for ia64 (the line is present in two
>> different places in the file, one for GCC and the other for HP's
>> compiler).  So the question is whether we need it for any other
>> architectures.
> 
> Ok. Let me talk to IBM guys...

With help from IBM Japan Ltd. we did some tests on a larger IBM
machine than Tom Lane has used for his
test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
In his case it was IBM 8406-71Y, which has 8 physical cores and
4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32
physical cores and 4SMT(128 threadings), 256GB RAM.

The test method was same as the one in the article above. The
differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer
size(8GB).

We tested 3 methods to enhance spin lock contention:

1) Add "hint" parameter to lwarx op which is usable POWER6 or later
   architecure.

2) Add non-locked test in TAS()

3) #1 + #2

We saw small performance enhancement with #1, larger one with #2 and
even better with #1+#2.

Stock git head:

pgbench -c 1 -j 1 -S -T 300 bench   tps = 10356.306513 (including ...
pgbench -c 2 -j 1 -S -T 300 bench   tps = 21841.10285 (including ...
pgbench -c 8 -j 4 -S -T 300 bench   tps = 63800.868529 (including ...
pgbench -c 16 -j 8 -S -T 300 bench  tps = 144872.64726 (including ...
pgbench -c 32 -j 16 -S -T 300 bench tps = 120943.238461 (including ...
pgbench -c 64 -j 32 -S -T 300 bench tps = 108144.933981 (including ...
pgbench -c 128 -j 64 -S -T 300 benchtps = 92202.782791 (including ...

With hint (method #1):

pgbench -c 1 -j 1 -S -T 300 bench   tps = 11198.1872 (including ...
pgbench -c 2 -j 1 -S -T 300 bench   tps = 21390.592014 (including ...
pgbench -c 8 -j 4 -S -T 300 bench   tps = 74423.488089 (including ...
pgbench -c 16 -j 8 -S -T 300 bench  tps = 153766.351105 (including ...
pgbench -c 32 -j 16 -S -T 300 bench tps = 134313.758113 (including ...
pgbench -c 64 -j 32 -S -T 300 bench tps = 129392.154047 (including ...
pgbench -c 128 -j 64 -S -T 300 benchtps = 105506.948058 (including ...

Non-locked test in TAS() (method #2):

pgbench -c 1 -j 1 -S -T 300 bench   tps = 10537.893154 (including ...
pgbench -c 2 -j 1 -S -T 300 bench   tps = 22019.388666 (including ...
pgbench -c 8 -j 4 -S -T 300 bench   tps = 78763.930379 (including ...
pgbench -c 16 -j 8 -S -T 300 bench  tps = 142791.99724 (including ...
pgbench -c 32 -j 16 -S -T 300 bench tps = 222008.903675 (including ...
pgbench -c 64 -j 32 -S -T 300 bench tps = 209912.691058 (including ...
pgbench -c 128 -j 64 -S -T 300 benchtps = 199437.23965 (including ...

With hint and non-locked test in TAS (#1 + #2)

pgbench -c 1 -j 1 -S -T 300 bench   tps = 11419.881375 (including ...
pgbench -c 2 -j 1 -S -T 300 bench   tps = 21919.530209 (including ...
pgbench -c 8 -j 4 -S -T 300 bench   tps = 74788.242876 (including ...
pgbench -c 16 -j 8 -S -T 300 bench  tps = 156354.988564 (including ...
pgbench -c 32 -j 16 -S -T 300 bench tps = 240521.495 (including ...
pgbench -c 64 -j 32 -S -T 300 bench tps = 235709.272642 (including ...
pgbench -c 128 -j 64 -S -T 300 benchtps = 220135.729663 (including ...

Since each core usage is around 50% in the benchmark, there is room for 
further performance improvement by eliminating other contentions, tuning 
compiler option etc.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-17 Thread Tom Lane
Robert Haas  writes:
> On Mon, Oct 17, 2011 at 11:54 PM, Tom Lane  wrote:
>> http://www.faqs.org/rfcs/rfc3629.html

> I'm still confused.  The input string is already known to be valid
> UTF-8, so the second byte (if there is one) must be between 0x80 and
> 0xBF.  Therefore it will be neither 0xED nor 0xF4.

I haven't read the patch lately, but ED and F4 are special as *first*
bytes.  Maybe the logic isn't quite right, or you read it wrong?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-10-17 Thread Robert Haas
On Tue, Oct 18, 2011 at 12:11 AM, Tatsuo Ishii  wrote:
>>> That would be great.  What I've been using as a test case is pgbench
>>> -S -c $NUM_CPU_CORES -j $NUM_CPU_CORES with scale factor 100 and
>>> shared_buffers=8GB.
>>>
>>> I think what you'd want to compare is the performance of unpatched
>>> master, vs. the performance with this line added to s_lock.h for your
>>> architecture:
>>>
>>> #define TAS_SPIN(lock)  (*(lock) ? 1 : TAS(lock))
>>>
>>> We've now added that line for ia64 (the line is present in two
>>> different places in the file, one for GCC and the other for HP's
>>> compiler).  So the question is whether we need it for any other
>>> architectures.
>>
>> Ok. Let me talk to IBM guys...
>
> With help from IBM Japan Ltd. we did some tests on a larger IBM
> machine than Tom Lane has used for his
> test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
> In his case it was IBM 8406-71Y, which has 8 physical cores and
> 4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32
> physical cores and 4SMT(128 threadings), 256GB RAM.
>
> The test method was same as the one in the article above. The
> differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer
> size(8GB).
>
> We tested 3 methods to enhance spin lock contention:
>
> 1) Add "hint" parameter to lwarx op which is usable POWER6 or later
>   architecure.
>
> 2) Add non-locked test in TAS()
>
> 3) #1 + #2
>
> We saw small performance enhancement with #1, larger one with #2 and
> even better with #1+#2.

Hmm, so you added the non-locked test in TAS()?  Did you try adding it
just to TAS_SPIN()?  On Itanium, I found that it was slightly better
to do it only in TAS_SPIN() - i.e. in the contended case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-10-17 Thread Tatsuo Ishii
>> With help from IBM Japan Ltd. we did some tests on a larger IBM
>> machine than Tom Lane has used for his
>> test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
>> In his case it was IBM 8406-71Y, which has 8 physical cores and
>> 4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32
>> physical cores and 4SMT(128 threadings), 256GB RAM.
>>
>> The test method was same as the one in the article above. The
>> differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer
>> size(8GB).
>>
>> We tested 3 methods to enhance spin lock contention:
>>
>> 1) Add "hint" parameter to lwarx op which is usable POWER6 or later
>>   architecure.
>>
>> 2) Add non-locked test in TAS()
>>
>> 3) #1 + #2
>>
>> We saw small performance enhancement with #1, larger one with #2 and
>> even better with #1+#2.
> 
> Hmm, so you added the non-locked test in TAS()?  Did you try adding it
> just to TAS_SPIN()?  On Itanium, I found that it was slightly better
> to do it only in TAS_SPIN() - i.e. in the contended case.

The actual test was performed by one of our engineers in my company
(Toshihiro Kitagawa). I think the answer to your question is yes, but
let me talk to him to make it sure.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Collecting statistics on CSV file data

2011-10-17 Thread Etsuro Fujita
(2011/10/18 2:27), Shigeru Hanada wrote:
> The new patch could be applied with some shifts.  Regression tests of
> core and file_fdw have passed cleanly.  Though I've tested only simple
> tests, ANALYZE works for foreign tables for file_fdw, and estimation of
> costs and selectivity seem appropriate.

Thank you for your testing.

> New API AnalyzeForeignTable
> ===

> And I think that AnalyzeForeignTable should be optional, and it would be
> very useful if a default handler is provided.  Probably a default
> handler can use basic FDW APIs to acquire sample rows from the result of
> "SELECT * FROM foreign_table" with skipping periodically.  It won't be
> efficient but I think it's not so unreasonable.

I agree with you. However, I think that it is difficult to support such
a default handler in a unified way because there exist external data
sources for which we cannot execute "SELECT * FROM foreign_table", e.g.,
web-accessible DBs limiting full access to the contents.

> Other issues
> 
> There are some other comments about non-critical issues.
> - When there is no analyzable column, vac_update_relstats is not called.
>   Is this behavior intentional?
> - psql can't complete foreign table name after ANALYZE.
> - A new parameter has been added to vac_update_relstats in a recent
> commit.  Perhaps 0 is OK for that parameter.

I'll check.

> - ANALYZE without relation name ignores foreign tables because
> get_rel_oids doesn't list foreign tables.

I think that it might be better to ignore foreign tables by default
because analyzing such tables may take long depending on FDW.

> - IMO logging "analyzing foo.bar" should not be done in
> AnalyzeForeignTable handler of each FDW because some FDW might forget to
> do it.  Maybe it should be pulled up to analyze_rel or somewhere in core.
> - It should be mentioned in a document that foreign tables are not
> analyzed automatically because they are read-only.

OK. I'll revise.

> Regards,

Best regards,
Etsuro Fujita

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-10-17 Thread Pavan Deolasee
On Tue, Oct 18, 2011 at 10:04 AM, Robert Haas  wrote:

> Hmm, so you added the non-locked test in TAS()?  Did you try adding it
> just to TAS_SPIN()?  On Itanium, I found that it was slightly better
> to do it only in TAS_SPIN() - i.e. in the contended case.
>

Would it be a good change for S_LOCK() to use TAS_SPIN()  as well ?

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2011-10-17 Thread Jun Ishiduka

> + /*
> +  * The backend writes WAL of FPW at checkpoint. However, The 
> backend do
> +  * not need to write WAL of FPW at checkpoint shutdown because 
> it
> +  * performs when startup finishes.
> +  */
> + XLogReportParameters(REPORT_ON_BACKEND);
> 
> I'm still unclear why that WAL doesn't need to be written at shutdown
> checkpoint.
> Anyway, the first sentence in the above comments is not right. Not a backend 
> but
> a bgwriter writes that WAL at checkpoint.
> 
> The second also seems not to be right. It implies that a shutdown checkpoint 
> is
> performed only at end of startup. But it may be done when smart or fast 
> shutdown
> is requested.


Okay. 
I change to the following messages.

/* 
 * The bgwriter writes WAL of FPW at checkpoint. But does not at shutdown.
 * Because XLogReportParameters() is always called at the end of startup
 * process, it does not need to be called at shutdown.
 */


In addition, I change macro name.

REPORT_ON_BACKEND -> REPORT_ON_BGWRITER


Regards.


Jun Ishizuka
NTT Software Corporation
TEL:045-317-7018
E-Mail: ishizuka@po.ntts.co.jp




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers