Re: [HACKERS] proposal: additional error fields

2012-05-02 Thread Peter Eisentraut
On tis, 2012-05-01 at 20:13 -0400, Tom Lane wrote:
 I don't deny that we probably need to reclassify a few error cases,
 and fix some elogs that should be ereports, before this approach would
 be really workable.  My point is that it's *close*, whereas let's
 invent some new error severities is not close to reality and will
 break all sorts of stuff.

We might hit a road block because some of these sqlstates are defined by
the SQL standard.  But at least we should try this first, and if it
doesn't work make another field that contains the admin/server-level
severity instead of the client-side/flow-control severity level.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

2012-05-02 Thread Hannu Krosing
On Tue, 2012-05-01 at 21:22 -0400, David Johnston wrote:
 On May 1, 2012, at 20:41, Hannu Krosing ha...@2ndquadrant.com wrote:
  
  Most people don't work in strongly-typed environment, and thus would
  work around such restriction if they need a simple JSON value at the
  other end of the interchange.
  
  
  My personal take it is have it fail since any arbitrary decision to cast 
  to JSON Text

For arrays and records the json text and jason value are exactly the
same. it is just that json representations of simple types are
officially not JSON texts.

  is going to make someone unhappy and supposedly they can 
  modify their query so that the result generates whatever format they 
  desire.
  
  Do you actually have such an experience or is it just a wild guess ?
  
  
 
 So even given the semantic differences between an object and a scalar 
 I am better understanding where interpreting JSON as JSON Value makes 
 sense.  However, if I convert a record or array to JSON I expect to get 
 a JSON Text even if the there is only a single column or value in the input.  

Of course you will, and you will get a Json Text even for empty object
or array. 

array[1] and 1 and {'one':1} are all different and will stay such.

 I guess my take is that record - JSON text while anything else is JSON 
 value.  Whether it is worth maiming the special case for record is 
 worthwhile I really do not know but the semantic difference does exist; 
 and record output is a significant aspect of PostgreSQL output.

I have never suggested that we special-case an 1-element record or list
and start returning only the contained value for these.

 I get the ease-of-use aspect but also recognize that sometimes being slightly 
 harder to use is worthwhile if you eliminate ambiguities or limit the 
 possibility to make mistakes.

There are no ambiguities in what is returnded for record or array.

But not being able to return JSON values via cast to json for some types
or not using such casts will make extending the json support for types
by user much much harder. And nonstandard. 

Using simple cast to json is very PostgreSQL-ish way to give support of
json to any type



-- 
---
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

2012-05-02 Thread Hannu Krosing
On Tue, 2012-05-01 at 19:11 -0400, Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
  On Tue, May 1, 2012 at 9:56 AM, Joey Adams 
  joeyadams3.14...@gmail.comwrote:
  No, the RFC says (emphasis mine):
  
  A JSON *text* is a serialized object or array.
  
  If we let the JSON type correspond to a *value* instead, this
  restriction does not apply, and the JSON type has a useful recursive
  definition.
 
  I think you're playing with words. But in any case, the RFC says this
  regarding generators:
  5. Generators
 A JSON generator produces JSON text.  The resulting text MUST
 strictly conform to the JSON grammar.
 
 I read over the RFC, and I think the only reason why they restricted
 JSON texts to represent just a subset of JSON values is this cute
 little hack in section 3 (Encoding):
 
Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx  UTF-32BE
00 xx 00 xx  UTF-16BE
xx 00 00 00  UTF-32LE
xx 00 xx 00  UTF-16LE
xx xx xx xx  UTF-8
 
 They need a guaranteed 2 ASCII characters to make that work, and
 they won't necessarily get that many with a bare string literal.
 
 Since for our purposes there is not, and never will be, any need to
 figure out whether a JSON input string is encoded in UTF16 or UTF32,
 I find myself agreeing with the camp that says we might as well consider
 that our JSON type corresponds to JSON values not JSON texts.  I also
 notice that json_in() seems to believe that already.
 
 However, that doesn't mean I'm sold on the idea of getting rid of
 array_to_json and row_to_json in favor of a universal to_json()
 function.  In particular, both of those have optional pretty_bool
 arguments that don't fit nicely at all in a generic conversion
 function.  The meaning of that flag is very closely tied to the
 input being an array or record respectively.

The flags probably should not be tied to specific type, as JSON is
recursive and as such I think the current one-top-level-element-per row
is quite limited form of pretty-printing.

I have a table with a field the type of which is an array of type of
another table, and what I currently get with pretty=true is

hannu=# select row_to_json(test3, true) from test3;
-[ RECORD
1 
]
row_to_json | {id:1,
|
data3:[{id:1,data2:{id:1,data:0.262814193032682,tstamp:2012-04-05
 13:21:03.235204},tstamp:2012-04-05 
13:25:03.644497},{id:2,data2:{id:2,data:0.157406373415142,tstamp:2012-04-05
 13:21:05.2033},tstamp:2012-04-05 13:25:03.644497}],
|  tstamp:2012-04-16 14:40:15.795947}

What I would like to get what python's pprint does for the same json:

 pprint(row)
{'id': 1,
 'data3': [{'data2': {'data': '0.262814193032682',
  'id': 1,
  'tstamp': '2012-04-05 13:21:03.235204'},
'id': 1,
'tstamp': '2012-04-05 13:25:03.644497'},
   {'data2': {'data': '0.157406373415142',
  'id': 2,
  'tstamp': '2012-04-05 13:21:05.2033'},
'id': 2,
'tstamp': '2012-04-05 13:25:03.644497'}],
 'tstamp': '2012-04-16 14:40:15.795947'}

If we have a pretty flag why not make it work all the way down the
structure ?

 I'm inclined to leave these functions as they are, and consider
 adding a universal to_json(anyelement) (with no options) later.

To achieve recursive prettyprinting the better way is to have an
universal to_json(anyelement) with a prettyprinting option 

to_json(datum anyelement, indent int)

with the behavior that if indent is NULL or negative integer no
pretty-printing is done, if it is 0 printing starts at left margin and
if it is a positive integer then this number of spaces is added to the
left for each row (except the first one) of the json representation.

And it would be overridable for specific types, so that hstore could
provide its own

to_json(datum hstore, indent int)

which would do the correct pretty-printing for hstor-as-json_object
representation.

 Because it would not have options, it would not be meant to cover
 cases where there's value in formatting or conversion options;
 so it wouldn't render the existing functions entirely obsolete,
 nor would it mean there would be no need for other specialized
 conversion functions.

I don't object to row_to_json() and array_to_json() functions being
there as a convenience and as the two official functions guaranteed to
return JSON text.

   regards, tom lane

-- 
---

Re: [HACKERS] Analyzing foreign tables memory problems

2012-05-02 Thread Albe Laurenz
I wrote:
 Noah Misch wrote:
 During ANALYZE, in analyze.c, functions compute_minimal_stats
 and compute_scalar_stats, values whose length exceed
 WIDTH_THRESHOLD (= 1024) are not used for calculating statistics
 other than that they are counted as too wide rows and assumed
 to be all different.

 This works fine with regular tables;

 With foreign tables the situation is different.  Even though
 values exceeding WIDTH_THRESHOLD won't get used, the complete
 rows will be fetched from the foreign table.  This can easily
 exhaust maintenance_work_mem.

 I can think of two remedies:
 1) Expose WIDTH_THRESHOLD in commands/vacuum.h and add documentation
so that the authors of foreign data wrappers are aware of the
problem and can avoid it on their side.
This would be quite simple.

 Seems reasonable.  How would the FDW return an indication that a
value was
 non-NULL but removed due to excess width?
 
 The FDW would return a value of length WIDTH_THRESHOLD+1 that is
 long enough to be recognized as too long, but not long enough to
 cause a problem.

Here is a simple patch for that.

Yours,
Laurenz Albe


analyze.patch
Description: analyze.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Jeroen Vermeulen

On 2012-05-01 22:06, Robert Haas wrote:


It might also be interesting to provide a mechanism to pre-extend a
relation to a certain number of blocks, though if we did that we'd
have to make sure that autovac got the memo not to truncate those
pages away again.


Good point.  And just to check before skipping over it, do we know that 
autovacuum not leaving enough slack space is not a significant cause of 
the bottlenecks in the first place?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How hard would it be to support LIKE in return declaration of generic record function calls ?

2012-05-02 Thread Hannu Krosing
Hi Hackers

How hard would it be to add support for LIKE syntax, similar to table
def in field list declaration for generic record functions

What I'dd like to be able to do is to have a generic json_to_record
function 

CREATE OR REPLACE RECORD json_to_record(json) RETURNS RECORD AS $$
...
$$ LANGUAGE ... ;

and then be able to call it like this

insert into test2 
select * from json_to_record(jrec json) as (like test2);

ERROR:  syntax error at or near like

instead of explicitly spelling out the structure of table test2 in the
AS part.

insert into test2 
select * from json_to_record(jrec json)
 as (id int, data2 test, tstamp timestamp);
INSERT 0 1


PS.

As a pie-in-the-sky wish I'd prefer of course even simpler syntax of

insert into test2 json_to_record(jrec json);

or at least  

insert into test2 json_to_record(jrec json)::test2;

:)
- 
---
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Peter Geoghegan
On 2 May 2012 04:24, Robert Haas robertmh...@gmail.com wrote:
 I think Tom's question of whether the parser or lexer is the problem
 is something that ought to be investigated.  Personally, I suspect
 that our tendency to use lists everywhere, for everything, an artifact
 of our proud LISP heritage, may be costing us dearly in terms of parse
 time.  However, there's a difference between suspicion and proof, and
 I certainly have none of the latter.

It's funny that you should say that, because I actually had a
discussion with Greg Stark over drinks about this recently. John
Bentley has described an experiment that undermines many traditional
beliefs about the trade-offs represented by using a linked list rather
than an array. The test is, using a modern computer, generate N
integers at random and insert them in order into a sequence. Then,
randomly remove integers from the sequence, one at a time. What is the
performance at different sizes of N when the sequence is a
doubly-linked list, and when it is an array? If you graph the two, the
results are perhaps rather surprising. I think his graph went up to
100,000. The initial result shows a line representing an array down
near the bottom of the graph. The list line looks exponential. Even if
you use memory pooling so the list doesn't have to allocate memory as
needed, the array still roundly beats the list, albeit not quite so
convincingly and without the list hitting the roof near 100,000. I
don't think that I need to point out that this is for inserting and
deleting, and that's when you're supposed to use lists.

The point is that on modern architectures, with many layers of cache,
the cost of the linear search to get the insertion point completely
dominates - this is without the array availing of a binary search, in
the interest of fairness. CPU caches happen to do a very good job of
moving over on-average half of the array for inserting elements at
random points. The list is much larger than the array, with the two
extra pointers per element (yeah, I know that we use singly linked
lists, but we have other disadvantages compared to typical C lists),
which matters. Predictable usage patterns - that is, temporal and
spatial locality, resulting in good usage of the memory hierarchy
matters a lot. We're not talking about a small difference, either. I
think the difference in the published test was something like the
array was 50 - 100 times faster. The list results in far more cache
misses than the array. So, I'm right there with you - using lists
everywhere is bad news.

As for the question of Flex/Quex, I'm not in an immediate position to
sink any more time into it, but it remains on my list of things to
pursue for 9.3, though it's only about number 3 on that list right
now.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 7:16 AM, Jeroen Vermeulen j...@xs4all.nl wrote:
 On 2012-05-01 22:06, Robert Haas wrote:
 It might also be interesting to provide a mechanism to pre-extend a
 relation to a certain number of blocks, though if we did that we'd
 have to make sure that autovac got the memo not to truncate those
 pages away again.

 Good point.  And just to check before skipping over it, do we know that
 autovacuum not leaving enough slack space is not a significant cause of the
 bottlenecks in the first place?

I'm not sure exactly what you mean by this: autovacuum doesn't need
any slack space.  Regular DML operations can certainly benefit from
slack space, both within each page and overall within the relation.
But there's no evidence that vacuum is doing too good a job cleaning
up the mess, forcing the relation to be re-extended.  Rather, the
usual (and frequent) complaint is that vacuum is leaving way too much
slack space - i.e. bloat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-05-02 Thread Robert Haas
On Thu, Apr 26, 2012 at 8:03 PM, Robert Haas robertmh...@gmail.com wrote:
 So, as a first step, I've committed a patch that just throws a hard
 conflict.  I think we probably want to optimize this further, and I'm
 going to work investigate that next.  But it seemed productive to get
 this much out of the way first, so I did.

I've been thinking about this some more.  What's worrying me is that a
visibility conflict, however we implement it, could be *worse* from
the user's point of view than just killing the query.  After all,
there's a reasonable likelihood that a single visibility map page
covers the whole relation (or all the blocks that the user is
interested in), so any sort of conflict is basically going to turn the
index-only scan into an index-scan plus some extra overhead.  And if
the planner had known that the user was going to get an index-only
scan rather than just a plain index scan, it might well have picked
some other plan in the first place.

Another problem is that, if we add a test for visibility conflicts
into visibilitymap_test(), I'm afraid we're going to drive up the cost
of that function very significantly.  Previous testing suggests that
that efficiency or lack thereof of that function is already a
performance problem for index-only scans, which kinda makes me not
that excited about adding another branch in there somewhere (and even
less excited about any proposed implementation that would add an
lwlock acquire/release or similar).

So on further reflection I'm thinking it may be best just to stick
with a hard conflict for now and see what feedback we get from beta
testers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Robert Haas
On Tue, May 1, 2012 at 6:02 PM, Alexander Korotkov aekorot...@gmail.com wrote:
 Right. When number of trigrams is big, it is slow to scan posting list of
 all of them. The solution is this case is to exclude most frequent trigrams
 from index scan.  But, it require some kind of statistics of trigrams
 frequencies which we don't have. We could estimate frequencies using some
 hard-coded assumptions about natural languages. Or we could exclude
 arbitrary trigrams. But I don't like both these ideas. This problem is also
 relevant for LIKE/ILIKE search using trigram indexes.

I was thinking you could perhaps do it just based on the *number* of
trigrams, not necessarily their frequency.

 Probably you have some comments on idea of conversion from pg_wchar to
 multibyte? Is it acceptable at all?

Well, I'm not an expert on encodings, but it seems like a logical
extension of what we're doing right now, so I don't really see why
not.  I'm confused by the diff hunks in pg_mule2wchar_with_len,
though.  Presumably either the old code is right (in which case, don't
change it) or the new code is right (in which case, there's a bug fix
needed here that ought to be discussed and committed separately from
the rest of the patch).  Maybe I am missing something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: additional error fields

2012-05-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On tis, 2012-05-01 at 20:13 -0400, Tom Lane wrote:
 I don't deny that we probably need to reclassify a few error cases,
 and fix some elogs that should be ereports, before this approach would
 be really workable.  My point is that it's *close*, whereas let's
 invent some new error severities is not close to reality and will
 break all sorts of stuff.

 We might hit a road block because some of these sqlstates are defined by
 the SQL standard.

My guess is that all the ones defined in the SQL standard are expected
errors, more or less by definition, and thus not interesting according
to Peter G's criteria.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Alexander Korotkov
On Wed, May 2, 2012 at 4:50 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, May 1, 2012 at 6:02 PM, Alexander Korotkov aekorot...@gmail.com
 wrote:
  Right. When number of trigrams is big, it is slow to scan posting list of
  all of them. The solution is this case is to exclude most frequent
 trigrams
  from index scan.  But, it require some kind of statistics of trigrams
  frequencies which we don't have. We could estimate frequencies using some
  hard-coded assumptions about natural languages. Or we could exclude
  arbitrary trigrams. But I don't like both these ideas. This problem is
 also
  relevant for LIKE/ILIKE search using trigram indexes.

 I was thinking you could perhaps do it just based on the *number* of
 trigrams, not necessarily their frequency.


Imagine we've two queries:
1) SELECT * FROM tbl WHERE col LIKE '%abcd%';
2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%';

The first query require reading posting lists of trigrams abc and bcd.
The second query require reading posting lists of trigrams abc, bcd,
cde, def, efg, fgh, ghi, hij and ijk.
We could decide to use index scan for first query and sequential scan for
second query because number of posting list to read is high. But it is
unreasonable because actually second query is narrower than the first one.
We can use same index scan for it, recheck will remove all false positives.
When number of trigrams is high we can just exclude some of them from index
scan. It would be better than just decide to do sequential scan. But the
question is what trigrams to exclude? Ideally we would leave most rare
trigrams to make index scan cheaper.


  Probably you have some comments on idea of conversion from pg_wchar to
  multibyte? Is it acceptable at all?

 Well, I'm not an expert on encodings, but it seems like a logical
 extension of what we're doing right now, so I don't really see why
 not.  I'm confused by the diff hunks in pg_mule2wchar_with_len,
 though.  Presumably either the old code is right (in which case, don't
 change it) or the new code is right (in which case, there's a bug fix
 needed here that ought to be discussed and committed separately from
 the rest of the patch).  Maybe I am missing something.


Unfortunately I didn't understand original logic of pg_mule2wchar_with_len.
I just did proposal about how it could be. I hope somebody more familiar
with this code would clarify this situation.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Peter Geoghegan
On 2 May 2012 04:57, Tom Lane t...@sss.pgh.pa.us wrote:
 FWIW, I think only developers not packagers would really be taking such
 a hit.  I assume we'd continue to ship prebuilt lexer output in
 tarballs, so there'd seldom be a reason for a packager to need to run
 the tool.  Given the extremely slow rate of churn of the lexer, it might
 not be necessary for most developers to have the tool installed, either,
 if we were willing to put the derived file into git.

Incidentally, I had an unrelated conversation with someone (I think it
might have been Heikki) a while back, where it was suggested that Flex
and Bison could be run through web services. This might actually make
hacking Postgres on windows far easier, because the last time I tried
to do that the hard way, I gave up, suspecting that there must be some
kind of Winflex bug that selectively manifests itself - certainly, the
population of windows hackers is small enough that it could go
unnoticed for quite a while. Such an approach could be part of the
solution to this problem (although, incidentally, Quex maintains
visual studio support quite well, and even has graphical instructions
here: http://quex.sourceforge.net/doc/html/intro/visual_studio_trouble.html
).

It might be the case that some kind of virtualisation and/or
authentication (Postgres community account required) could make this
approach practical. It just isn't the path of least resistance right
now. Visual studio builds would be far easier if we did this, which
might encourage more hackers to venture into Windows land.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov aekorot...@gmail.com wrote:
 I was thinking you could perhaps do it just based on the *number* of
 trigrams, not necessarily their frequency.

 Imagine we've two queries:
 1) SELECT * FROM tbl WHERE col LIKE '%abcd%';
 2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%';

 The first query require reading posting lists of trigrams abc and bcd.
 The second query require reading posting lists of trigrams abc, bcd,
 cde, def, efg, fgh, ghi, hij and ijk.
 We could decide to use index scan for first query and sequential scan for
 second query because number of posting list to read is high. But it is
 unreasonable because actually second query is narrower than the first one.
 We can use same index scan for it, recheck will remove all false positives.
 When number of trigrams is high we can just exclude some of them from index
 scan. It would be better than just decide to do sequential scan. But the
 question is what trigrams to exclude? Ideally we would leave most rare
 trigrams to make index scan cheaper.

True.  I guess I was thinking more of the case where you've got
abc|def|ghi|jkl|mno|pqr|stu|vwx|yza|  There's probably some point
at which it becomes silly to think about using the index.

  Probably you have some comments on idea of conversion from pg_wchar to
  multibyte? Is it acceptable at all?

 Well, I'm not an expert on encodings, but it seems like a logical
 extension of what we're doing right now, so I don't really see why
 not.  I'm confused by the diff hunks in pg_mule2wchar_with_len,
 though.  Presumably either the old code is right (in which case, don't
 change it) or the new code is right (in which case, there's a bug fix
 needed here that ought to be discussed and committed separately from
 the rest of the patch).  Maybe I am missing something.

 Unfortunately I didn't understand original logic of pg_mule2wchar_with_len.
 I just did proposal about how it could be. I hope somebody more familiar
 with this code would clarify this situation.

Well, do you think the current code is buggy, or not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Alexander Korotkov
On Wed, May 2, 2012 at 5:48 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov aekorot...@gmail.com
 wrote:
   Imagine we've two queries:
  1) SELECT * FROM tbl WHERE col LIKE '%abcd%';
  2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%';
 
  The first query require reading posting lists of trigrams abc and
 bcd.
  The second query require reading posting lists of trigrams abc, bcd,
  cde, def, efg, fgh, ghi, hij and ijk.
  We could decide to use index scan for first query and sequential scan for
  second query because number of posting list to read is high. But it is
  unreasonable because actually second query is narrower than the first
 one.
  We can use same index scan for it, recheck will remove all false
 positives.
  When number of trigrams is high we can just exclude some of them from
 index
  scan. It would be better than just decide to do sequential scan. But the
  question is what trigrams to exclude? Ideally we would leave most rare
  trigrams to make index scan cheaper.

 True.  I guess I was thinking more of the case where you've got
 abc|def|ghi|jkl|mno|pqr|stu|vwx|yza|  There's probably some point
 at which it becomes silly to think about using the index.


Yes, such situations are also possible.

  Well, I'm not an expert on encodings, but it seems like a logical
  extension of what we're doing right now, so I don't really see why
  not.  I'm confused by the diff hunks in pg_mule2wchar_with_len,
  though.  Presumably either the old code is right (in which case, don't
  change it) or the new code is right (in which case, there's a bug fix
  needed here that ought to be discussed and committed separately from
  the rest of the patch).  Maybe I am missing something.
 
  Unfortunately I didn't understand original logic
 of pg_mule2wchar_with_len.
  I just did proposal about how it could be. I hope somebody more familiar
  with this code would clarify this situation.

 Well, do you think the current code is buggy, or not?


Probably, but I'm not sure. The conversion seems lossy to me unless I'm
missing something about mule encoding.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] proposal: additional error fields

2012-05-02 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 My guess is that all the ones defined in the SQL standard are
 expected errors, more or less by definition, and thus not
 interesting according to Peter G's criteria.
 
On a scan through the list, I didn't see any exceptions to that,
except for the F0 class.  To restate what the standard reserves
for standard SQLSTATE values in the form of a regular expression, it
looks like:
 
'^[0-4A-H][0-9A-Z][0-4A-H][0-9A-Z][0-9A-Z]$'
 
Eyeballing the errcode page in the docs, it looks like there are
PostgreSQL-assigned values that start with '5', 'P', and 'X'.  That
F0 class looks suspicious; are those really defined by standard or
did we encroach on standard naming space with PostgreSQL-specific
values?

We also have PostgreSQL-specific values in standard classes where we
use 'P' for the third character, which is fine.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: additional error fields

2012-05-02 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 That F0 class looks suspicious; are those really defined by standard or
 did we encroach on standard naming space with PostgreSQL-specific
 values?

I think we screwed up on that :-(.  So we ought to renumber those
codes anyway.  Perhaps use PF instead of F0?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: additional error fields

2012-05-02 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 That F0 class looks suspicious; are those really defined by
 standard or did we encroach on standard naming space with
 PostgreSQL-specific values?
 
 I think we screwed up on that :-(.  So we ought to renumber those
 codes anyway.  Perhaps use PF instead of F0?
 
Sounds good to me.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] clog double-dip in heap_hot_search_buffer

2012-05-02 Thread Robert Haas
heap_hot_search_buffer() does this:

valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);

If it turns out that the tuple isn't valid (i.e. visible to our scan)
and we haven't yet found any live tuples in the current HOT chain,
then we check whether it's visible to anyone at all:

if (all_dead  *all_dead 
HeapTupleSatisfiesVacuum(heapTuple-t_data, RecentGlobalXmin,
 buffer) != HEAPTUPLE_DEAD)
*all_dead = false;

This is obviously an important optimization for accelerating index
cleanup, but it has an unfortunate side-effect: it considerably
increases the frequency of CLOG access.  Normally,
HeapTupleSatisfiesVisibility() will sent hint bits on the tuple, but
sometimes it can't, either because the inserter has not yet committed
or the inserter's commit record hasn't been flushed or the deleter
hasn't committed or the deleter's commit record hasn't been flushed.
When that happens, HeapTupleSatisfiesVacuum() gets called a moment
later and repeats the same CLOG lookups.  It is of course possible for
a state change to happen in the interim, but that's not really a
reason to repeat the lookups; asking the same question twice in a row
just in case you should happen to get an answer you like better the
second time is not generally a good practice, even if it occasionally
works.

The attached patch adds a new function HeapTupleIsSurelyDead(), a
cut-down version of HeapTupleSatisfiesVacuum().  It assumes that,
first, we only care about distinguishing between dead and anything
else, and, second, that any transaction for which hint bits aren't yet
set is still running.  This allows it to be a whole lot simpler than
HeapTupleSatisfiesVacuum() and to get away without doing any CLOG
access.  It also changes heap_hot_search_buffer() to use this new
function in lieu of HeapTupleSatisfiesVacuum().

I found this problem by using 'perf record -e cs -g' and 'perf report
-g' to find out where context switches were happening.  It turns out
that this is a very significant contributor to CLOG-related context
switches.  Retesting with those same tools shows that the patch does
in fact make those context switches go away.  On a long pgbench test,
the effects of WALInsertLock contention, ProcArrayLock contention,
checkpoint-related latency, etc. will probably swamp the effect of the
patch.  On a short test, however, the effects are visible; and in
general anything that optimizes away access to heavily contended
shared memory data structures is probably a good thing.  Permanent
tables, scale factor 100, 30-second tests:

master:
tps = 22175.025992 (including connections establishing)
tps = 22072.166338 (including connections establishing)
tps = 22653.876341 (including connections establishing)

with patch:
tps = 26586.623556 (including connections establishing)
tps = 25564.098898 (including connections establishing)
tps = 25756.036647 (including connections establishing)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


surely-dead-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temporary tables under hot standby

2012-05-02 Thread Bruce Momjian
On Thu, Apr 26, 2012 at 08:56:40AM +0100, Simon Riggs wrote:
 A full GTT implementation is not required and the design differed from
 that. I don't think hideously complicated is accurate, that's just
 you're way of saying and I disagree. Either route is pretty complex
 and not much to choose between them, apart from the usefulness of the
 end product - GTTs are not that beneficial as a feature in themselves.
 
 The current problems of our temp table approach are
 1. Catalog bloat
 2. Consumption of permanent XIDs for DML on temp tables.  This increases 
 COMMIT
   cost on the master and is a non-starter under hot standby.
 3. autovacuum cannot VACUUM temporary tables, but their relfrozenxid values
   delay pg_clog truncation and can trigger a wraparound-prevention shutdown.
 4. sinval traffic from every CREATE TEMP TABLE et al.
 5. CREATE TEMP TABLE records AccessExclusiveLock in the global lock table.
 6. We don't automatically drop temporary tables that existed at the point of a
   crash, because they look much like permanent tables.2. Cross-backend
 access/security
 7. Temp tables don't work on HS
 8. No Global Temp tables
 
 Implementing GTTs solves (8) and provides some useful tools to solve
 other points. Note that GTTs do not themselves solve 1-7 in full,
 hence my point that GTTs are an endpoint not a way station. The way
 forwards is not to concentrate on GTTs but to provide a set of
 facilities that allow all the more basic points 1-6 to be addressed,
 in full and then solve (7) and (8).  If we pretend (8) solves (7) as
 well, we will make mistakes in implementation that will waste time and
 deliver reduced value.
 
 In passing I note that GTTs are required to allow PostgresXC to
 support temp tables, since they need a mechanism to makes a single
 temp table definition work on multiple nodes with different data in
 each.
 
 Simply put, I don't think we should be emphasising things that are
 needed for PostgresXC and EDB AS, but not that important for
 PostgreSQL users.

I think if implementing global temporary tables only for hot standby
user (#7), it might be of limited usefulness, but the ability to avoid
system table churn (#1) means global temporary tables would have a wide
usefulness, even without hot standby use.  The idea of sharing optimizer
statistics also has a lot of merit.  

FYI, global temp tables have been on the TODO list long before EDB
employment or PostgresXC.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Latch for the WAL writer - further reducing idle wake-ups.

2012-05-02 Thread Peter Geoghegan
Attached patch latches up the WAL Writer, reducing wake-ups and thus
saving electricity in a way that is more-or-less analogous to my work
on the BGWriter:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6d90eaaa89a007e0d365f49d6436f35d2392cfeb

I am hoping this gets into 9.2 . I am concious of the fact that this
is quite late, but it the patch addresses an open item, the concluding
part of a much wider feature. In any case, it is a useful patch, that
ought to be committed at some point. I should point out:

1. This functionality was covered by the group commit patch that I
worked on back in January, which was submitted in advance of the
commitfest deadline. However, an alternative implementation was
ultimately committed that did not consider WAL Writer wake-ups.

2. The WAL writer is the most important auxiliary process to latch-up.
Though it is tied with the BGWriter at 5 wake-ups per second by
default, I consider the WAL Writer to be more important than the
BGWriter because I find it much more plausible that the WAL Writer
really won't need to be around for much of the time, as with a
read-mostly work load. Cloud type deployments often have read-mostly
workloads, so we can still save some power even if the DB is actually
servicing lots of read queries. That being the case, it would be a
shame if we didn't get this last one in, as it adds a lot more value
than any of the other patches.

3. This is a fairly simple patch; as I've said, it works in a way that
is quite analogous to the BGWriter patch, applying lessons learned
there.

With this patch, my instrumentation shows that wake-ups when Postgres
reaches a fully idle state are just 2.7 per second for the entire
postgres process group, quite an improvement on the 7.6 per second in
HEAD. This is exactly what you'd expect from a reduction of 5 wake-ups
per second to 0.1 per second on average for the WAL Writer.

I have determined this with PowerTOP 1.13 on my Fedora 16 laptop. Here
is an example session, began after the cluster reached a fully idle
state, with this patch applied (if, alternatively, I want to see
things at per-process granularity, I can get that from PowerTOP 1.98
beta 1, which is available from my system's package manager):

[peter@peterlaptop powertop-1.13]$ sudo ./powertop -d --time=300
[sudo] password for peter:
PowerTOP 1.13   (C) 2007 - 2010 Intel Corporation

Collecting data for 300 seconds


CnAvg residency
C0 (cpu running)( 2.8%)
polling   0.0ms ( 0.0%)
C1 mwait  0.5ms ( 1.0%)
C2 mwait  0.9ms ( 0.6%)
C3 mwait  1.4ms ( 0.1%)
C4 mwait  6.7ms (95.4%)
P-states (frequencies)
  2.61 Ghz 5.7%
  1.80 Ghz 0.1%
  1200 Mhz 0.1%
  1000 Mhz 0.2%
   800 Mhz93.5%
Wakeups-from-idle per second : 171.3interval: 300.0s
no ACPI power usage estimate available
Top causes for wakeups:
  23.0% (134.5)   chrome
***SNIP***
   0.5% (  2.7)   postgres
***SNIP***

This is a rather low number, that will make us really competitive with
other RDBMSs in this area. Recall that we started from 11.5 wake-ups
for an idle Postgres cluster with a default configuration.

To put the 2.7 number in context, I measured MySQL's wake-ups at 2.2
last year (mysql-server version 5.1.56, Fedora 14), though I
subsequently saw much higher numbers (over 20 per second) for version
5.5.19 on Fedora 16, so you should probably take that with a grain of
salt - I don't know anything about MySQL, and so cannot really be sure
that I'm making an objective comparison in comparing that number with
the number of wake-ups Postgres has with a stock postgresql.conf.

I've employed the same trick used when a buffer is dirtied for the
BGWriter - most of the time, the SetLatch() calls will check a single
flag, and find it already set. We are careful to only arm the latch
with a call to ResetLatch() when it is really needed. Rather than
waiting for the clocksweep to be lapped, we wait for a set number of
iterations of consistent inactivity.

I've made the WAL Writer use its process latch, rather than the latch
that was previously within XLogCtl. This seems much more idiomatic, as
in doing so we reserve the right to register generic signal handlers.
With a non-process latch, we'd have to worry about signal invalidation
issues on an ongoing basis, since the handler wouldn't be calling
SetLatch() against the latch we waited on. I have also added a comment
in latch.h generally advising against ad-hoc shared latches where .

I took initial steps to quantify the performance hit from this patch.
A simple insert.sql pgbench-tools benchmark on my laptop, with a
generic configuration showed no problems, though I do not assume that
this conclusively proves the case. Results:

http://walwriterlatch.staticloud.com/

My choice of XLogInsert() as an additional site at which to call
SetLatch() was one that wasn't taken easily, and frankly I'm not
entirely confident that I couldn't have been just as effective while

Re: [HACKERS] Temporary tables under hot standby

2012-05-02 Thread Michael Nolan
What is the use case for temporary tables on a hot standby server?

Perhaps this is a noobie question, but it seems to me that a hot standby
server's use by* applications* or *users* should be limited to transactions
that don't alter the database in any form.

However, I can see where temporary tables might be needed at the system
level (if not already available) in order to prepare more efficient plans
for some complex read-only queries.
--
Mike Nolan


Re: [HACKERS] Temporary tables under hot standby

2012-05-02 Thread Christopher Browne
On Wed, May 2, 2012 at 11:39 AM, Michael Nolan htf...@gmail.com wrote:
 What is the use case for temporary tables on a hot standby server?

Simple...

We required a hot standby server in order to get improved reliability.

But we don't want it to sit there chewing power + money, unused.

We want to *use* it to support our reporting applications.

And the developers used temporary tables to marshal results used in
some of those reports.

There are conflicting senses of read-only here...
  - In one strict sense, to generate tuples in temp tables means it's
not read only access.
  - But since the users running reports aren't allowed to modify the
data in the application tables that they are querying, how is that
*not* fairly characterized as read only access???
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] clog double-dip in heap_hot_search_buffer

2012-05-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 + /* Deleter committed, so tuple is alive if the XID is old enough. */
 + return TransactionIdPrecedes(HeapTupleHeaderGetXmax(tuple), OldestXmin);

s/alive/dead/ in that comment?  Otherwise this seems like a good idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] clog double-dip in heap_hot_search_buffer

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 +     /* Deleter committed, so tuple is alive if the XID is old enough. */
 +     return TransactionIdPrecedes(HeapTupleHeaderGetXmax(tuple), 
 OldestXmin);

 s/alive/dead/ in that comment?

Yep, good catch, thanks.

 Otherwise this seems like a good idea.

Do you think I should apply this to 9.2, or wait until 9.3?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temporary tables under hot standby

2012-05-02 Thread Simon Riggs
On Wed, May 2, 2012 at 4:14 PM, Bruce Momjian br...@momjian.us wrote:

 I think if implementing global temporary tables only for hot standby
 user (#7), it might be of limited usefulness, but the ability to avoid
 system table churn (#1) means global temporary tables would have a wide
 usefulness, even without hot standby use.

Yes, many parts of Noah's proposal would be useful for normal running.
And as Jim points out, they are SQL Standard, as well as supported by
Oracle and DB2, so again GTTs would hit that TODO item.

And Noah's proposal does take us more than 50%, maybe 80% of the way
to what I think would be most useful for HS.

So overall, I do encourage the proposal.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] clog double-dip in heap_hot_search_buffer

2012-05-02 Thread Simon Riggs
On Wed, May 2, 2012 at 5:07 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 2, 2012 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 +     /* Deleter committed, so tuple is alive if the XID is old enough. */
 +     return TransactionIdPrecedes(HeapTupleHeaderGetXmax(tuple), 
 OldestXmin);

 s/alive/dead/ in that comment?

 Yep, good catch, thanks.

 Otherwise this seems like a good idea.

 Do you think I should apply this to 9.2, or wait until 9.3?

Now please. Well done.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] clog double-dip in heap_hot_search_buffer

2012-05-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, May 2, 2012 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Otherwise this seems like a good idea.

 Do you think I should apply this to 9.2, or wait until 9.3?

We're not at beta yet, and it seems pretty safe/self-contained, so
I have no objection to committing now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Jeff Janes
On Tue, May 1, 2012 at 5:53 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Quite apart from the practical difficulties that we have with Flex
 (the fact that the authors are non-responsive and possibly retired,
 that annoying compiler warning, and the fact that we are forced to
 maintain our own Windows binaries of 2.5.35), it has some notable
 disadvantages. I am aware that the MySQL people use their own
 hand-coded lexical analyzer named sql_lex.cc, which provides a yacc
 interface, while avoiding using any lexical analyzer generator
 whatsoever. They can't have done this just for fun, and no doubt this
 goes some way to explaining their continued performance advantage for
 very simple queries. I have heard people complain about Postgres
 parser overhead for pgbench -S style use-cases where it is
 unreasonably high, and I've heard them do so more than once.

For -S -M simple, the time spent planning is 5 times more than the
time spent parsing.  It may be worthwhile to reduce the time spent
parsing, but if the goal is parity with MySQL it probably isn't the
place to start.

(If you use a bottom-up profiler, the time spent in planning is
scattered over so many different functions that none of them look very
important individually)

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] smart shutdown at end of transaction (was: Default mode for shutdown)

2012-05-02 Thread Bruce Momjian
On Sun, Apr 29, 2012 at 10:19:38AM +0100, Simon Riggs wrote:
 Maybe we don't need to do this over multiple releases, but we do need
 to give warning of possible incompatibilities. It would be good to see
 a specific post on hackers called Planned Incompatibilities in 9.2,
 or collect such things on the open items wiki, so that people
 listening can see what might happen and get a chance to object. Or if
 changes do go ahead, at least we give them a few months warning to
 change the downstream software. Otherwise all that happens is our new
 release comes out and fewer people use it because it takes ages to
 actually realign the software stack enough for our software to be
 used.

The release notes would certainly feature this as an incompatibility,
and would be present even before beta started.  Unless they skip reading
the release notes, it would be hard for them to miss this change.  I
also blog when major release notes are available for viewing.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié may 02 08:14:36 -0400 2012:
 On Wed, May 2, 2012 at 7:16 AM, Jeroen Vermeulen j...@xs4all.nl wrote:
  On 2012-05-01 22:06, Robert Haas wrote:
  It might also be interesting to provide a mechanism to pre-extend a
  relation to a certain number of blocks, though if we did that we'd
  have to make sure that autovac got the memo not to truncate those
  pages away again.
 
  Good point.  And just to check before skipping over it, do we know that
  autovacuum not leaving enough slack space is not a significant cause of the
  bottlenecks in the first place?
 
 I'm not sure exactly what you mean by this: autovacuum doesn't need
 any slack space.  Regular DML operations can certainly benefit from
 slack space, both within each page and overall within the relation.
 But there's no evidence that vacuum is doing too good a job cleaning
 up the mess, forcing the relation to be re-extended.  Rather, the
 usual (and frequent) complaint is that vacuum is leaving way too much
 slack space - i.e. bloat.

Hm.  I see those two things as different -- to me, bloat is unremoved
dead tuples, whereas slack space would be free space that can be reused
by new tuples.  Slack space is useful as it avoids relation extension;
bloat is not.

I wonder, though, if we should set a less-than-100 fillfactor for heap
relations.  Just like default_statistic_target, it seems that the
default value should be a conservative tradeoff between two extremes.
This doesn't help extension for bulk insertions a lot, of course, but
it'd be useful for tables where HOT updates happen with some regularity.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Peter Geoghegan
On 2 May 2012 17:20, Jeff Janes jeff.ja...@gmail.com wrote:
 For -S -M simple, the time spent planning is 5 times more than the
 time spent parsing.  It may be worthwhile to reduce the time spent
 parsing, but if the goal is parity with MySQL it probably isn't the
 place to start.

Could you please share your figures and methodology? I've heard of far
larger proportions than that.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] eqjoinsel_semi still sucks ...

2012-05-02 Thread Tom Lane
I looked into Maxim Boguk's complaint of bad estimation of antijoin size:
http://archives.postgresql.org/pgsql-general/2012-05/msg00033.php

I can reproduce what I think the problem is in the regression database.
We do okay with this:

regression=# explain analyze select * from tenk1 a where not exists(select 1 
from tenk1 b where a.thousand = b.unique2);
QUERY PLAN  
   
---
 Hash Anti Join  (cost=395.26..1003.26 rows=1 width=244) (actual 
time=264.324..264.324 rows=0 loops=1)
   Hash Cond: (a.thousand = b.unique2)
   -  Seq Scan on tenk1 a  (cost=0.00..458.00 rows=1 width=244) (actual 
time=0.050..47.798 rows=1 loops=1)
   -  Hash  (cost=270.26..270.26 rows=1 width=4) (actual 
time=129.420..129.420 rows=1 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 274kB
 -  Index Only Scan using tenk1_unique2 on tenk1 b  (cost=0.00..270.26 
rows=1 width=4) (actual time=0.422..65.480 rows=1 loops=1)
   Heap Fetches: 0
 Total runtime: 267.732 ms
(8 rows)

but not so okay when a filter condition is added inside the sub-select:

regression=# explain analyze select * from tenk1 a where not exists(select 1 
from tenk1 b where a.thousand = b.unique2 and b.two = 0);
  QUERY PLAN
  
--
 Hash Anti Join  (cost=545.50..1091.00 rows=1 width=244) (actual 
time=123.713..265.185 rows=5090 loops=1)
   Hash Cond: (a.thousand = b.unique2)
   -  Seq Scan on tenk1 a  (cost=0.00..458.00 rows=1 width=244) (actual 
time=0.048..46.685 rows=1 loops=1)
   -  Hash  (cost=483.00..483.00 rows=5000 width=4) (actual 
time=123.483..123.483 rows=5000 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 137kB
 -  Seq Scan on tenk1 b  (cost=0.00..483.00 rows=5000 width=4) (actual 
time=0.059..91.405 rows=5000 loops=1)
   Filter: (two = 0)
   Rows Removed by Filter: 5000
 Total runtime: 284.889 ms
(9 rows)

Now, eqjoinsel_semi is correctly estimating that the condition
a.thousand = b.unique2 is unselective in itself: all values of
a.thousand will have join partners in the first case.  The problem comes
in trying to account for the additional filter condition.  The heuristic
we're currently using is to reduce the number of distinct values assumed
for the inner variable according to the selectivity of the additional
conditions.  In this case, though, that results in reducing ndistinct
for b.unique2 from 1 to 5000, which is still more than ndistinct for
a.thousand (i.e., 1000), so the final selectivity estimate doesn't
change at all.  Oops.

On reflection I think that the idea of clamping ndistinct beforehand is
just wrong, and what we ought to do instead is apply a multiplier to the
selectivity estimate afterwards.  In the case of a base rel we could
just multiply by the selectivity of its baserestrictinfo list.  For join
rels it's a bit harder to guess how much a given input relation might
have been decimated, but if the join's estimated size is smaller than
the output size of the base rel the correlation var came from, we could
multiply by that ratio (on top of whatever correction came from the base
rel's restriction clauses).

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 12:26 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié may 02 08:14:36 -0400 2012:
 On Wed, May 2, 2012 at 7:16 AM, Jeroen Vermeulen j...@xs4all.nl wrote:
  On 2012-05-01 22:06, Robert Haas wrote:
  It might also be interesting to provide a mechanism to pre-extend a
  relation to a certain number of blocks, though if we did that we'd
  have to make sure that autovac got the memo not to truncate those
  pages away again.
 
  Good point.  And just to check before skipping over it, do we know that
  autovacuum not leaving enough slack space is not a significant cause of the
  bottlenecks in the first place?

 I'm not sure exactly what you mean by this: autovacuum doesn't need
 any slack space.  Regular DML operations can certainly benefit from
 slack space, both within each page and overall within the relation.
 But there's no evidence that vacuum is doing too good a job cleaning
 up the mess, forcing the relation to be re-extended.  Rather, the
 usual (and frequent) complaint is that vacuum is leaving way too much
 slack space - i.e. bloat.

 Hm.  I see those two things as different -- to me, bloat is unremoved
 dead tuples, whereas slack space would be free space that can be reused
 by new tuples.  Slack space is useful as it avoids relation extension;
 bloat is not.

I guess I think of bloat as including both unremoved dead tuples and
unwanted internal free space.  If you create a giant table, delete 9
out of every 10 tuples, and vacuum, the table is still bloated, IMV.

 I wonder, though, if we should set a less-than-100 fillfactor for heap
 relations.  Just like default_statistic_target, it seems that the
 default value should be a conservative tradeoff between two extremes.
 This doesn't help extension for bulk insertions a lot, of course, but
 it'd be useful for tables where HOT updates happen with some regularity.

Perhaps, but in theory that should be self-correcting: the data should
spread itself onto the number of pages where HOT pruning is able to
prevent further relation extension.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Request to add options to tools/git_changelog

2012-05-02 Thread Bruce Momjian
On Sun, Apr 29, 2012 at 02:06:48PM -0400, Jay Levitt wrote:
 Bruce Momjian wrote:
 I am again requesting the addition of options to tools/git_changelog so
 I can more easily produce the release notes.  I asked for this during
 9.1 development and it was rejected.  I am currently using my own
 custom version of the tool, but have to merge community improvements
 into the tool every year before I use it.
 
 FYI in the general case of I have to maintain a patch set: Now
 that PG is on git, there's a tool called Stacked Git that lets you
 use git's excellent merge capabilities to maintain patches.
 
 http://www.procode.org/stgit/

I am unclear what stgit does that can't be done with git branches?  It
mentions pushing and popping patches --- is that it?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, May 2, 2012 at 12:26 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Hm.  I see those two things as different -- to me, bloat is unremoved
 dead tuples, whereas slack space would be free space that can be reused
 by new tuples.  Slack space is useful as it avoids relation extension;
 bloat is not.

 I guess I think of bloat as including both unremoved dead tuples and
 unwanted internal free space.  If you create a giant table, delete 9
 out of every 10 tuples, and vacuum, the table is still bloated, IMV.

The difficulty is to tell the difference between useless free space and
useful free space.  If there's a reasonable probability of putting new
data into a given chunk of free space in the near future, it's not
bloat.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié may 02 12:37:35 -0400 2012:
 
 On Wed, May 2, 2012 at 12:26 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of mié may 02 08:14:36 -0400 2012:
  On Wed, May 2, 2012 at 7:16 AM, Jeroen Vermeulen j...@xs4all.nl wrote:
   On 2012-05-01 22:06, Robert Haas wrote:
   It might also be interesting to provide a mechanism to pre-extend a
   relation to a certain number of blocks, though if we did that we'd
   have to make sure that autovac got the memo not to truncate those
   pages away again.
  
   Good point.  And just to check before skipping over it, do we know that
   autovacuum not leaving enough slack space is not a significant cause of 
   the
   bottlenecks in the first place?
 
  I'm not sure exactly what you mean by this: autovacuum doesn't need
  any slack space.  Regular DML operations can certainly benefit from
  slack space, both within each page and overall within the relation.
  But there's no evidence that vacuum is doing too good a job cleaning
  up the mess, forcing the relation to be re-extended.  Rather, the
  usual (and frequent) complaint is that vacuum is leaving way too much
  slack space - i.e. bloat.
 
  Hm.  I see those two things as different -- to me, bloat is unremoved
  dead tuples, whereas slack space would be free space that can be reused
  by new tuples.  Slack space is useful as it avoids relation extension;
  bloat is not.
 
 I guess I think of bloat as including both unremoved dead tuples and
 unwanted internal free space.  If you create a giant table, delete 9
 out of every 10 tuples, and vacuum, the table is still bloated, IMV.

Agreed.  Perhaps to solve this issue what we need is a way to migrate
tuples from later pages into earlier ones.  (This was one of the points,
never resolved, that we discussed during the VACUUM FULL rework.)

  I wonder, though, if we should set a less-than-100 fillfactor for heap
  relations.  Just like default_statistic_target, it seems that the
  default value should be a conservative tradeoff between two extremes.
  This doesn't help extension for bulk insertions a lot, of course, but
  it'd be useful for tables where HOT updates happen with some regularity.
 
 Perhaps, but in theory that should be self-correcting: the data should
 spread itself onto the number of pages where HOT pruning is able to
 prevent further relation extension.

True.  I gather you consider that the cases where it doesn't happen due
to particular conditions are the ones that need manual tweaking.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 12:46 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Agreed.  Perhaps to solve this issue what we need is a way to migrate
 tuples from later pages into earlier ones.  (This was one of the points,
 never resolved, that we discussed during the VACUUM FULL rework.)

Yeah, I agree.  And frankly, we need to find a way to make it work
without taking AccessExclusiveLock on the relation.  Having to run
VACUUM FULL is killing actual users and scaring off potential ones.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] clog double-dip in heap_hot_search_buffer

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 12:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, May 2, 2012 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Otherwise this seems like a good idea.

 Do you think I should apply this to 9.2, or wait until 9.3?

 We're not at beta yet, and it seems pretty safe/self-contained, so
 I have no objection to committing now.

OK, done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How hard would it be to support LIKE in return declaration of generic record function calls ?

2012-05-02 Thread Peter Eisentraut
On ons, 2012-05-02 at 13:40 +0200, Hannu Krosing wrote:
 How hard would it be to add support for LIKE syntax, similar to table
 def in field list declaration for generic record functions
 
 What I'dd like to be able to do is to have a generic json_to_record
 function 
 
 CREATE OR REPLACE RECORD json_to_record(json) RETURNS RECORD AS $$
 ...
 $$ LANGUAGE ... ;
 
 and then be able to call it like this
 
 insert into test2 
 select * from json_to_record(jrec json) as (like test2);

That would be very useful, and shouldn't be too hard to implement.  (I
had to look about three times to understand what this was supposed to
achieve, but I think the syntax is the right one after all.)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié may 02 12:55:17 -0400 2012:
 
 On Wed, May 2, 2012 at 12:46 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Agreed.  Perhaps to solve this issue what we need is a way to migrate
  tuples from later pages into earlier ones.  (This was one of the points,
  never resolved, that we discussed during the VACUUM FULL rework.)
 
 Yeah, I agree.  And frankly, we need to find a way to make it work
 without taking AccessExclusiveLock on the relation.  Having to run
 VACUUM FULL is killing actual users and scaring off potential ones.

And ideally without bloating the indexes while at it.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove dead ports

2012-05-02 Thread Peter Eisentraut
On ons, 2012-05-02 at 00:31 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  Remove dead ports
 
 Checking this patch, I noticed that config.guess and config.sub harbor
 most of the remaining references to those platforms, which reminded me:
 don't we usually update those files from autoconf upstream before beta?

Yes, once we know when beta is, we can move on that. ;-)

Btw., I had intentionally kept the uses in ps_status.c and getrusage.c,
because they remain useful in case someone wants to use these files for
reference.  But I guess that's debatable, because we no longer have a
way to prove that those uses actually continue to work.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove dead ports

2012-05-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On ons, 2012-05-02 at 00:31 -0400, Tom Lane wrote:
 Checking this patch, I noticed that config.guess and config.sub harbor
 most of the remaining references to those platforms, which reminded me:
 don't we usually update those files from autoconf upstream before beta?

 Yes, once we know when beta is, we can move on that. ;-)

Next week, I thought.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Jeff Janes
On Wed, May 2, 2012 at 9:31 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 2 May 2012 17:20, Jeff Janes jeff.ja...@gmail.com wrote:
 For -S -M simple, the time spent planning is 5 times more than the
 time spent parsing.  It may be worthwhile to reduce the time spent
 parsing, but if the goal is parity with MySQL it probably isn't the
 place to start.

 Could you please share your figures and methodology? I've heard of far
 larger proportions than that.

I used two methods.  One was to hack exec_simple_query so that, under
the control of a new GUC, it would do things such as pg_parse_query
the query 101 times, throwing away the results of the first 100 before
proceeding to use the 101 parse result as normal.  Then I just run
pgbench under both settings, take the difference in 1/TPS between them
and divide by 100 to get the seconds per parse (and multiple by 1e6 to
get usec/parse)

I did the same thing for pg_analyze_and_rewrite, and for
pg_analyze_and_rewrite+pg_plan_queries (pg_plan_queries scribbles on
the structure produced by pg_analyze_and_rewrite, so you have to
repeat both as a unit, and then subtract the the
pg_analyze_and_rewrite timings off afterwards to isolate just the
planning) .

On my current laptop and rebased to git HEAD, I got
2usec/pg_parse_query, 2usec/pg_analyze_and_rewrite, and
12usec/pg_plan_queries.   Since my laptop is dual core, I did this
with -c2 -j2.

Back when I originally implemented and tested this on much older
hardware and about one year older pgsql code base, the absolute values
of usec/action were several fold higher, but the ratios of 1:1:6 were
about the same.

This does risk that it will overlook caching effects by repeating the
same thing in a tight loop.  I.e. parses 2 through 101 might be much
faster than parse 1 was.  Also, it risks that I simply don't know what
I'm doing and my attempts to throw away the results of a parse are
misbegotten--I just overwrote the old pointer with the new one and
assume the memory context would clean up the resulting orphans.

I could try to clean up and post the patch that implements this if you want.

The second method was just to do --enable-profiling on a stock build
and look at the call graph section of gprof output.  It attributed 50%
to pg_plan_queries and children and about 10% to each of
pg_parse_query and pg_analyze_and_rewrite (including their respective
children).  I don't put tremendous faith in gprof's call graph, but
the fact that the results are in agreement with the other method gives
me more confidence in both.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Magnus Hagander
On Wed, May 2, 2012 at 3:33 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 2 May 2012 04:57, Tom Lane t...@sss.pgh.pa.us wrote:
 FWIW, I think only developers not packagers would really be taking such
 a hit.  I assume we'd continue to ship prebuilt lexer output in
 tarballs, so there'd seldom be a reason for a packager to need to run
 the tool.  Given the extremely slow rate of churn of the lexer, it might
 not be necessary for most developers to have the tool installed, either,
 if we were willing to put the derived file into git.

 Incidentally, I had an unrelated conversation with someone (I think it
 might have been Heikki) a while back, where it was suggested that Flex
 and Bison could be run through web services. This might actually make

Might've been me - I've been doing that for a long time to work around
winflex issues. But I never got around to doing anything like access
control or so, I just ran it on a hidden ip on a random port, and
simple curl call on the windows box..

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch pg_is_in_backup()

2012-05-02 Thread Gabriele Bartolini

Hi Gilles,

   Sorry for the delay.

Il 03/04/12 14:21, Gilles Darold ha scritto:

+1, this is also my point of view.


   I have looked at the patch that contains both pg_is_in_backup() and 
pg_backup_start_time().


   From a functional point of view it looks fine to me. I was thinking 
of adding the BackupInProgress() at the beginning of 
pg_backup_start_time(), but the AllocateFile() function already make 
sure the file exists.


   I have performed some basic testing of both functions and tried to 
inject invalid characters in the start time field of the backup_label 
file and it is handled (with an exception) by the server. Cool.


   I spotted though some formatting issues, in particular indentation 
and multi-line comments. Some rows are longer than 80 chars.


   Please resubmit with these cosmetic changes and it is fine with me. 
Thank you.


Cheers,
Gabriele

--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython crash (PG 92)

2012-05-02 Thread Peter Eisentraut
On lör, 2012-04-28 at 00:32 -0400, Tom Lane wrote:
 I'm inclined to think that the best fix is for
 PLy_spi_execute_fetch_result to copy the tupledesc into
 TopMemoryContext, not the current context.  This is a tad scary from a
 memory leakage standpoint, but I suppose that if python fails to recover
 the PLyResultObject, this isn't the only memory that's going to be
 leaked.
 
 This area appears to be shy a regression test case or two, in any event.

Fixed like that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] online debloatification (was: extending relations more efficiently)

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 1:06 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié may 02 12:55:17 -0400 2012:
 On Wed, May 2, 2012 at 12:46 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Agreed.  Perhaps to solve this issue what we need is a way to migrate
  tuples from later pages into earlier ones.  (This was one of the points,
  never resolved, that we discussed during the VACUUM FULL rework.)

 Yeah, I agree.  And frankly, we need to find a way to make it work
 without taking AccessExclusiveLock on the relation.  Having to run
 VACUUM FULL is killing actual users and scaring off potential ones.

 And ideally without bloating the indexes while at it.

Yeah.

Brainstorming wildly, how about something like this:

1. Insert a new copy of the tuple onto some other heap page.  The new
tuple's xmin will be that of the process doing the tuple move, and
we'll also set a flag indicating that a move is in progress.
2. Set a flag on the old tuple, indicating that a tuple move is in
progress.  Set its TID to the new location of the tuple.  Set xmax to
the tuple mover's XID.  Optionally, truncate away the old tuple data,
leaving just the tuple header.
3. Scan all indexes and replace any references to the old tuple's TID
with references to the new tuple's TID.
4. Commit.
5. Once the XID of the tuple mover is all-visible, nuke the old TID
and clear the flag on the new tuple indicating a move-in-progress
(these two operations must be done together, atomically, with a single
WAL record covering both).

Any scan that encounters the old tuple will decide whether or not it
can see the tuple based on the xmin  xmax in the old tuple's header.
If it decides it can see it, it follows the TID pointer and does its
work using the new tuple instead.  Scans that encounter the new tuple
need no special handling; the existing visibility rules are fine for
that case.  Prune operations must not truncate away tuples that are
being moved into or out of the page, and vacuum must not mark pages
containing such tuples as all-visible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Bruce Momjian
On Wed, May 02, 2012 at 10:37:58AM -0700, Jeff Janes wrote:
 I could try to clean up and post the patch that implements this if you want.
 
 The second method was just to do --enable-profiling on a stock build
 and look at the call graph section of gprof output.  It attributed 50%
 to pg_plan_queries and children and about 10% to each of
 pg_parse_query and pg_analyze_and_rewrite (including their respective
 children).  I don't put tremendous faith in gprof's call graph, but
 the fact that the results are in agreement with the other method gives
 me more confidence in both.

Those are the ratio's I (and I think Tom) expected to see.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Beta time?

2012-05-02 Thread Bruce Momjian
 On Wed, May 02, 2012 at 01:15:48PM -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On ons, 2012-05-02 at 00:31 -0400, Tom Lane wrote:
  Checking this patch, I noticed that config.guess and config.sub harbor
  most of the remaining references to those platforms, which reminded me:
  don't we usually update those files from autoconf upstream before beta?
 
  Yes, once we know when beta is, we can move on that. ;-)
 
 Next week, I thought.

How are we handling the Monday release with everyone at PGCon?  Was that
resolved?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PL/Python result set slicing broken in Python 3

2012-05-02 Thread Peter Eisentraut
This doesn't work anymore with Python 3:

rv = plpy.execute(...)
do_something(rv[0:1])

Apparently, they changed the C API for doing slicing, or rather made one
of the two APIs for it silently do nothing.  Details are difficult to
find, but this email message seems to contain something:
http://mail.python.org/pipermail/python-3000/2007-August/009851.html.

I'll try to sort this out sometime, but if someone wants to take a shot
at it, go ahead.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How hard would it be to support LIKE in return declaration of generic record function calls ?

2012-05-02 Thread Merlin Moncure
On Wed, May 2, 2012 at 12:06 PM, Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2012-05-02 at 13:40 +0200, Hannu Krosing wrote:
 How hard would it be to add support for LIKE syntax, similar to table
 def in field list declaration for generic record functions

 What I'dd like to be able to do is to have a generic json_to_record
 function

 CREATE OR REPLACE RECORD json_to_record(json) RETURNS RECORD AS $$
 ...
 $$ LANGUAGE ... ;

 and then be able to call it like this

 insert into test2
 select * from json_to_record(jrec json) as (like test2);

 That would be very useful, and shouldn't be too hard to implement.  (I
 had to look about three times to understand what this was supposed to
 achieve, but I think the syntax is the right one after all.)

Although I like the functionality, is this better than the trick used
by hstore/populate_record?  That approach doesn't require syntax
changes and allows you to execute the function without 'FROM'.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Modeling consumed shmem sizes, and some thorns

2012-05-02 Thread Daniel Farina
Hello List,

I'd like to share with you some experiences we've had while
investigating what we'd have to do to make very-very tiny databases.

First, the formulae at
http://www.postgresql.org/docs/9.1/static/kernel-resources.html#SHARED-MEMORY-PARAMETERS
(17-2) seem misleading, particularly with regard to the overhead of
supporting a large number of connections: it undercounts by a rather
lot.  I think the estimate of 270 bytes per lock seems too far low on
Linux-amd64.  In addition, this number seem undercounted by 50% or
more because of the addition of predicate locks in 9.1.

Presuming the base cost of 1800-base-cost-per-connection is still
right, experimentally it seems to me that the right numbers are closer
to 700 bytes per max_locks_per_transaction, and 650 for each
max_pred_locks_per_transaction, although there appear to be some
non-linear behavior that make this a hazy projection.

Besides accuracy, there is a thornier problem here that has to do with
hot standby (although the use case is replication more generally) when
one has heterogeneously sized database resources. As-is, it is
required that locking-related structures -- max_connections,
max_prepared_xacts, and max_locks_per_xact (but not predicate locks,
is that an oversight?) must be a larger number on a standby than on a
primary.

In a heterogeneous environment where one uses WAL-based replication,
that means that to obtain unity and full compatibility among
different-sized systems one must always permit a large number of
connections (specifically, the largest number supported by any
database configuration), and those large number of connections can
occupy a large fraction of the overall memory allotted to a small
database, making the amount of lock-related memory consumption on,
say, a database that is intended to only receive 100MB of shmem
approach nearly 50% of the overall total, and that is rather
unfortunate.  I can see why that'd be hard to fix (maybe, instead, a
more logical replication layer is a better investment of time), but I
thought it an interesting consideration that was worth discussing.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] online debloatification (was: extending relations more efficiently)

2012-05-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Brainstorming wildly, how about something like this:

 1. Insert a new copy of the tuple onto some other heap page.  The new
 tuple's xmin will be that of the process doing the tuple move, and
 we'll also set a flag indicating that a move is in progress.
 2. Set a flag on the old tuple, indicating that a tuple move is in
 progress.  Set its TID to the new location of the tuple.  Set xmax to
 the tuple mover's XID.  Optionally, truncate away the old tuple data,
 leaving just the tuple header.
 3. Scan all indexes and replace any references to the old tuple's TID
 with references to the new tuple's TID.
 4. Commit.

What happens when you crash partway through that?  Also, what happens if
somebody wishes to update the tuple before the last step is complete?

In any case, this doesn't address the fundamental problem with unlocked
tuple movement, which is that you can't just arbitrarily change a
tuple's TID when there might be other operations relying on the TID
to hold still.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] online debloatification (was: extending relations more efficiently)

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 4:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Brainstorming wildly, how about something like this:

 1. Insert a new copy of the tuple onto some other heap page.  The new
 tuple's xmin will be that of the process doing the tuple move, and
 we'll also set a flag indicating that a move is in progress.
 2. Set a flag on the old tuple, indicating that a tuple move is in
 progress.  Set its TID to the new location of the tuple.  Set xmax to
 the tuple mover's XID.  Optionally, truncate away the old tuple data,
 leaving just the tuple header.
 3. Scan all indexes and replace any references to the old tuple's TID
 with references to the new tuple's TID.
 4. Commit.

 What happens when you crash partway through that?

Well, there are probably a few loose ends here, but the idea is that
if we crash after step 2 is complete, the next vacuum is responsible
for performing steps 3 and 4.  As written, there's probably a problem
if we crash between (1) and (2); I think those would need to be done
atomically, or at least we need to make sure that the moving-in flag
is set on the new tuple if and only if there is actually a redirect
pointing to it.

 Also, what happens if
 somebody wishes to update the tuple before the last step is complete?

Then we let them.  The idea is that they see the redirect tuple at the
old TID, follow it to the new copy of the tuple, and update that
instead.

 In any case, this doesn't address the fundamental problem with unlocked
 tuple movement, which is that you can't just arbitrarily change a
 tuple's TID when there might be other operations relying on the TID
 to hold still.

Well, that's why I invented the redirect tuple, so that anyone who was
relying on the TID to hold still would see the redirect and say, oh, I
need to go look at this other TID instead.  It's entirely possible
there's some reason why that can't work, but at the moment I'm not
seeing it.  I see that there's a problem if the old TID gets freed
while someone's relying on it, but replacing it with a pointer to some
other TID seems like it ought to be workable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: additional error fields

2012-05-02 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 That F0 class looks suspicious; are those really defined by
 standard or did we encroach on standard naming space with
 PostgreSQL-specific values?

 I think we screwed up on that :-(.  So we ought to renumber those
 codes anyway.  Perhaps use PF instead of F0?
 
 Sounds good to me.

I thought for a few minutes about whether we ought to try to sneak
such a change into 9.2.  But given that we're talking about probably
doing a number of other SQLSTATE reassignments in the future, it
seems likely better to wait and absorb all that pain in a single
release cycle.  It seems moderately unlikely that any client-side
code is dependent on these specific assignments, but still I'd rather
not see a dribble of we changed some SQLSTATEs compatibility flags
across several successive releases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unnecessary WAL archiving after failover

2012-05-02 Thread Robert Haas
On Fri, Mar 23, 2012 at 10:03 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On second thought, I found other issues about WAL archiving after
 failover. So let me clarify the issues again.

 Just after failover, there can be three kinds of WAL files in new
 master's pg_xlog directory:

 (1) WAL files which were recycled to by restartpoint

 I've already explained upthread the issue which these WAL files cause
 after failover.

Check.

 (2) WAL files which were restored from the archive

 In 9.1 or before, the restored WAL files don't remain after failover
 because they are always restored onto the temporary filename
 RECOVERYXLOG. So the issue which I explain from now doesn't exist
 in 9.1 or before.

 In 9.2dev, as the result of supporting cascade replication,
 an archived WAL file is restored onto correct file name so that
 cascading walsender can send it to another standby. This restored
 WAL file has neither .ready nor .done archive status file. After
 failover, checkpoint checks the archive status file of the restored
 WAL file to attempt to recycle it, finds that it has neither .ready
 nor ,done, and creates .ready. Because of existence of .ready,
 it will be archived again even though it obviously already exists in
 the archival storage :(

 To prevent a restored WAL file from being archived again, I think
 that .done should be created whenever WAL file is successfully
 restored (of course this should happen only when archive_mode is
 enabled). Thought?

 Since this is the oversight of cascade replication, I'm thinking to
 implement the patch for 9.2dev.

Yes, I think we had better fix this in 9.2.  As you say, it's a loose
end from streaming replication.  Do you have a patch?

 (3) WAL files which were streamed from the master

 These WAL files also don't have any archive status, so checkpoint
 creates .ready for them after failover. And then, all or many of
 them will be archived at a time, which would cause I/O spike on
 both WAL and archival storage.

 To avoid this problem, I think that we should change walreceiver
 so that it creates .ready as soon as it completes the WAL file. Also
 we should change the archiver process so that it starts up even in
 standby mode and archives the WAL files.

 If each server has its own archival storage, the above solution would
 work fine. But if all servers share the archival storage, multiple archiver
 processes in those servers might archive the same WAL file to
 the shared area at the same time. Is this OK? If not, to avoid this,
 we might need to separate archive_mode into two: one for normal mode
 (i.e., master), another for standbfy mode. If the archive is shared,
 we can ensure that only one archiver in the master copies the WAL file
 at the same time by disabling WAL archiving in standby mode but
 enabling it in normal mode. Thought?

Another option would be to run the archiver in both modes and somehow
pass a flag indicating whether it's running in standby mode or normal
running.

 Invoking the archiver process in standby mode is new feature,
 not a bug fix. It's too late to propose new feature for 9.2. So I'll
 propose this for 9.3.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch for the WAL writer - further reducing idle wake-ups.

2012-05-02 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 Attached patch latches up the WAL Writer, reducing wake-ups and thus
 saving electricity in a way that is more-or-less analogous to my work
 on the BGWriter:
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6d90eaaa89a007e0d365f49d6436f35d2392cfeb
 I am hoping this gets into 9.2 . I am concious of the fact that this
 is quite late, but it the patch addresses an open item, the concluding
 part of a much wider feature.

It is getting a bit late to be considering such changes for 9.2, but
I'm willing to review and commit this if there's not anybody who feels
strongly that it's too late.  Personally I think it's in the nature of
cleanup and so fair game as long as we haven't formally started beta.
However I will confess to some bias about wanting to get the server's
idle wake-up rate down, because Fedora people have been bugging me
about that for a long time now.  So I'm probably not the best person to
objectively evaluate whether we should hold this for 9.3.  Comments?

Schedule questions aside, I'm disturbed by this bit:

 My choice of XLogInsert() as an additional site at which to call
 SetLatch() was one that wasn't taken easily, and frankly I'm not
 entirely confident that I couldn't have been just as effective while
 placing the SetLatch() call in a less hot, perhaps higher-level
 codepath.

Adding any contention at all to XLogInsert doesn't seem like a smart
idea, even if you failed to measure any problem in the specific tests
you made.  I wonder whether we could not improve matters by adding
an additional bool wal_writer_needs_wakening in the state that's
considered to be protected by WALInsertLock.  XLogInsert would check
this while still holding the lock, and only consider that it needs to do
a SetLatch if the flag was set, whereupon it would clear it before
releasing the lock.  In the normal case this would add one uncontended
fetch followed by boolean-test-and-jump to the work done while holding
the lock, which should be pretty negligible.  Then, the WAL writer would
need to take WALInsertLock to set that flag, but presumably it should
only be doing that when there is no contention for the lock.  (In fact,
we could have it do a ConditionalLockAcquire on WALInsertLock for the
purpose, and consider that failure means it shouldn't go to sleep after
all.)

Now this might sound pretty much equivalent to testing the latch's
is_set flag; perhaps it is and I'm worrying over nothing.  But I'm
thinking that the wal_writer_needs_wakening flag would be in a cache
line that an acquirer of WALInsertLock would have to get ownership of
anyway, if it is adjacent to variables that XLogInsert has to manipulate
anyway.  On the other hand, the WAL writer's process latch would be in
some other cache line that would also need to get passed around a lot,
if it's touched during every XLogInsert.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Features of Postgresql and Postgres-xc with MySQL

2012-05-02 Thread Michael Paquier
On Wed, May 2, 2012 at 2:37 PM, Vivek Singh Raghuwanshi 
vivekraghuwan...@gmail.com wrote:

 Please send me the link or white papers from where i can get information
 like.
 3. Feature comparison of PostgreSQL and Postgres-XC
 4. and can we use Postgres-XC in production with mission critical env with
 heavy load

If you are looking for some white papers about Postgres-XC:
- some presentation documents done by people:
http://sourceforge.net/projects/postgres-xc/files/Presentation/
- some publications:
http://sourceforge.net/projects/postgres-xc/files/Publication/
- some presentations I did about the project:
http://michael.otacoo.com/presentations/conferences/

Hope this helps.
Regards,
-- 
Michael Paquier
http://michael.otacoo.com


[HACKERS] Advisory locks seem rather broken

2012-05-02 Thread Tom Lane
According to
http://archives.postgresql.org/pgsql-general/2012-04/msg00374.php
advisory locks now cause problems for prepared transactions, which
ought to ignore them.  It appears to me that this got broken by
commit 62c7bd31c8878dd45c9b9b2429ab7a12103f3590, which marked the
userlock lock method as transactional, which seems just about 100%
misguided to me.  At the very least this would require reconsidering
every single place that tests lock transactionality, and that evidently
did not happen.

If this patch weren't already in a released branch I would be arguing
for reverting it.  As is, I think we're going to have to clean it up.
I don't have time to look at it in detail right now, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch for the WAL writer - further reducing idle wake-ups.

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 7:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It is getting a bit late to be considering such changes for 9.2, but
 I'm willing to review and commit this if there's not anybody who feels
 strongly that it's too late.  Personally I think it's in the nature of
 cleanup and so fair game as long as we haven't formally started beta.
 However I will confess to some bias about wanting to get the server's
 idle wake-up rate down, because Fedora people have been bugging me
 about that for a long time now.  So I'm probably not the best person to
 objectively evaluate whether we should hold this for 9.3.  Comments?

Well, I feel that one of the weaknesses of our CommitFest process is
that changes like this (which are really pretty small) end up having
the same deadline as patches that are large (command triggers,
checksums, etc.); in fact, they sometimes end up having an earlier
deadline, because the people doing the big stuff end up continuing to
hack on it for another couple months while the door is shut to smaller
improvements.  So I'm not going to object if you feel like slipping
this one in.  I looked it over myself and I think it's broadly
reasonable, although I'm not too sure about the particular criteria
chosen for sending the WAL writer to sleep and waking it up again.
And like you I'd like to see some more improvement in this area.

 Adding any contention at all to XLogInsert doesn't seem like a smart
 idea, even if you failed to measure any problem in the specific tests
 you made.  I wonder whether we could not improve matters by adding
 an additional bool wal_writer_needs_wakening in the state that's
 considered to be protected by WALInsertLock.

I am skeptical about this, although it could be right.  It could also
be better the way Peter did it; a fetch of an uncontended cache line
is pretty cheap.  Another approach - which I think might be better
still - is to not bother kicking the WAL writer and let it wake up
when it wakes up.  Maybe have it hibernate for 3 seconds instead of
10, or something like that.  It seems unlikely to cause any real
problem if WAL writer takes a couple seconds to get with the program
after a long period of inactivity; note that an async commit will kick
it anyway, and a sync commit will probably half to flush WAL whether
the WAL writer wakes up or not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temporary tables under hot standby

2012-05-02 Thread Josh Berkus
Michael,

 What is the use case for temporary tables on a hot standby server?
 
 Perhaps this is a noobie question, but it seems to me that a hot standby
 server's use by* applications* or *users* should be limited to transactions
 that don't alter the database in any form.

A very common use for asynchronous replicas is to offload long-running
reporting jobs onto the replica so that they don't bog down the master.
 However, long-running reporting jobs often require temporary tables,
especially if they use some 3rd-party vendor's reporting tool.  For
example, the average Microstrategy report involves between 1 and 12
temporary tables.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-05-02 Thread Josh Berkus
On 5/2/12 10:20 AM, Jameison Martin wrote:
 Attached are the following as per various requests:
   * test_results.txt: the performance benchmarking results, 
 
   * TestTrailingNull.java: the performance benchmarking code, with a few 
 additional scenarios as per various requests
 
   * hardinfo_report.txt: some information about the hardware and OS of 
 the system on which the benchmarks were run, and
 
   * postgresql.conf: the postgresql.conf used when running benchmarks. 
 Note that the changes made to the vanilla postgresql.conf can be identified 
 by looking for the string 'jamie' in the file I attached (there aren't that 
 many)

Nice, thanks.  I'll try some of my own tests when I get a chance; I have
a really good use-case for this optimization.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()

2012-05-02 Thread Noah Misch
On Mon, Apr 30, 2012 at 02:35:20PM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  When GIN changes a metapage, we WAL-log its ex-header content and never use 
  a
  backup block.  This reduces WAL volume since the vast majority of the 
  metapage
  is unused.  However, ginRedoUpdateMetapage() only restores the WAL-logged
  content if the metapage LSN predates the WAL record LSN.  If a metapage 
  write
  tore and updated the LSN but not the other content, we would fail to 
  complete
  the update.  Instead, unconditionally reinitialize the metapage similar to 
  how
  _bt_restore_meta() handles the situation.
 
  I found this problem by code reading and did not attempt to build a test 
  case
  illustrating its practical consequences.  It's possible that there's no
  problem in practice on account of some reason I haven't contemplated.
 
 I think there's no problem in practice; the reason is that the
 GinMetaPageData struct isn't large enough to extend past the first
 physical sector of the page.  So it's in the same disk sector as the
 LSN and tearing is impossible.  Still, this might be a good
 future-proofing move, in case GinMetaPageData gets larger.

Can we indeed assume that all support-worthy filesystems align the start of
every file to a physical sector?  I know little about modern filesystem
design, but these references leave me wary of that assumption:

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14690.html
http://en.wikipedia.org/wiki/Block_suballocation

If it is a safe assumption, we could exploit it elsewhere.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch for the WAL writer - further reducing idle wake-ups.

2012-05-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... It seems unlikely to cause any real
 problem if WAL writer takes a couple seconds to get with the program
 after a long period of inactivity; note that an async commit will kick
 it anyway, and a sync commit will probably half to flush WAL whether
 the WAL writer wakes up or not.

That's a good point.  What about only kicking the WAL writer in code
paths where a backend found itself having to write/flush WAL for itself?
The added overhead is very surely negligible in such a situation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Features of Postgresql and Postgres-xc with MySQL

2012-05-02 Thread Vivek Singh Raghuwanshi
Thanks this is very helpful

Regards
ViVek

On Thu, May 3, 2012 at 4:57 AM, Michael Paquier
michael.paqu...@gmail.comwrote:

 On Wed, May 2, 2012 at 2:37 PM, Vivek Singh Raghuwanshi 
 vivekraghuwan...@gmail.com wrote:

 Please send me the link or white papers from where i can get information
 like.
 3. Feature comparison of PostgreSQL and Postgres-XC
 4. and can we use Postgres-XC in production with mission critical env
 with heavy load

 If you are looking for some white papers about Postgres-XC:
 - some presentation documents done by people:
 http://sourceforge.net/projects/postgres-xc/files/Presentation/
 - some publications:
 http://sourceforge.net/projects/postgres-xc/files/Publication/
 - some presentations I did about the project:
 http://michael.otacoo.com/presentations/conferences/

 Hope this helps.
 Regards,
 --
 Michael Paquier
 http://michael.otacoo.com




-- 
ViVek Raghuwanshi
Mobile -+91-09595950504

Skype - vivek_raghuwanshi


Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()

2012-05-02 Thread Daniel Farina
On Wed, May 2, 2012 at 6:06 PM, Noah Misch n...@leadboat.com wrote:
 Can we indeed assume that all support-worthy filesystems align the start of
 every file to a physical sector?  I know little about modern filesystem
 design, but these references leave me wary of that assumption:

 http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14690.html
 http://en.wikipedia.org/wiki/Block_suballocation

 If it is a safe assumption, we could exploit it elsewhere.

Not to say whether this is safe or not, but it *is* exploited
elsewhere, as I understand it: the pg_control information, whose
justification for its safety is its small size.  That may point to a
very rare problem with pg_control rather the safety of the assumption
it makes.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch for the WAL writer - further reducing idle wake-ups.

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 11:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... It seems unlikely to cause any real
 problem if WAL writer takes a couple seconds to get with the program
 after a long period of inactivity; note that an async commit will kick
 it anyway, and a sync commit will probably half to flush WAL whether
 the WAL writer wakes up or not.

 That's a good point.  What about only kicking the WAL writer in code
 paths where a backend found itself having to write/flush WAL for itself?
 The added overhead is very surely negligible in such a situation.

Yeah, I think that would make sense, though I'd probably still argue
for a hibernation period not quite so long as ten seconds.  Actually,
what I'd really like is for this to be adaptive: if we find that
there's no WAL to write, increase the time until the next wakeup by 10
ms until we hit the maximum of, say, 3 seconds.  If we find that there
is WAL to write, cut the time until the next wakeup in half until we
hit a minimum of, say, 20ms.  And, if we're forced to write/flush WAL
ourselves, or we async commit, kick the WAL writer in the pants and
wake him up right away.  That way we're willing to get
super-aggressive when needed, but we don't stay there very long once
the pounding ends.  Also, we avoid having a hard cut between regular
sleeps and deep hibernation; instead, we kind of gradually drift off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()

2012-05-02 Thread Tom Lane
Daniel Farina dan...@heroku.com writes:
 On Wed, May 2, 2012 at 6:06 PM, Noah Misch n...@leadboat.com wrote:
 Can we indeed assume that all support-worthy filesystems align the start of
 every file to a physical sector?  I know little about modern filesystem
 design, but these references leave me wary of that assumption:
 
 http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14690.html
 http://en.wikipedia.org/wiki/Block_suballocation
 
 If it is a safe assumption, we could exploit it elsewhere.

 Not to say whether this is safe or not, but it *is* exploited
 elsewhere, as I understand it: the pg_control information, whose
 justification for its safety is its small size.  That may point to a
 very rare problem with pg_control rather the safety of the assumption
 it makes.

I think it's somewhat common now for filesystems to attempt to optimize
very small files (on the order of a few dozen bytes) in that way.  It's
hard to see where's the upside for changing the conventional storage
allocation when the file is sector-sized or larger; the file system does
have to be prepared to rewrite the file on demand, and moving it from
one place to another isn't cheap.

That wikipedia reference argues for doing this type of optimization on
the last partial block of a file, which is entirely irrelevant for our
purposes since we always ask for page-multiples of space.  (The fact
that much of that might be useless padding is, I think, unknown to the
filesystem.)

Having said all that, I wasn't really arguing that this was a guaranteed
safe thing for us to rely on; just pointing out that it's quite likely
that the issue hasn't been seen in the field because of this type of
consideration.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: xReader, double-effort (was: Temporary tables under hot standby)

2012-05-02 Thread Jim Nasby

On 4/29/12 9:27 AM, Kevin Grittner wrote:

Maybe I can help with that by describing what the Wisconsin court
system does for circuit court data.


Thanks for the write-up, it was insightful.

One thing I wanted to mention is that non-binary replication has an added 
advantage over binary from a DR standpoint: if corruption occurs on a master it 
is more likely to make it into your replicas thanks to full page writes. You 
might want to consider that depending on how sensitive your data is.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future In-Core Replication

2012-05-02 Thread Jim Nasby

On 4/29/12 6:03 AM, Simon Riggs wrote:

The DML-WITH-LIMIT-1 is required to do single logical updates on tables
  with non-unique rows.
  And as for any logical updates we will have huge performance problem
  when doing UPDATE or DELETE on large table with no indexes, but
  fortunately this problem is on slave, not master;)

While that is possible, I would favour the do-nothing approach. By
making the default replication mode = none, we then require a PK to be
assigned before allowing replication mode = on for a table. Trying to
replicate tables without PKs is a problem that can wait basically.



Something that a in-core method might be able to do that an external one can't 
would be to support a method of uniquely identifying rows in tables with no 
PK's. A gross example (that undoubtedly wouldn't work in the real world) would 
be using TID's. A real-world implementation might be based on a hidden serial 
column.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers