Re: [HACKERS] Allowing multiple concurrent base backups

2011-03-21 Thread Heikki Linnakangas

On 18.03.2011 13:56, Heikki Linnakangas wrote:

On 18.03.2011 10:48, Heikki Linnakangas wrote:

On 17.03.2011 21:39, Robert Haas wrote:

On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masaomasao.fu...@gmail.com
wrote:

On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

Hmm, good point. It's harmless, but creating the history file in the
first
place sure seems like a waste of time.


The attached patch changes pg_stop_backup so that it doesn't create
the backup history file if archiving is not enabled.

When I tested the multiple backups, I found that they can have the same
checkpoint location and the same history file name.


$ for ((i=0; i4; i++)); do
pg_basebackup -D test$i -c fast -x -l test$i
done

$ cat test0/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test0

$ cat test1/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test1

$ cat test2/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test2

$ cat test3/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test3

$ ls archive/*.backup
archive/00010002.00B0.backup


This would cause a serious problem. Because the backup-end record
which indicates the same START WAL LOCATION can be written by the
first backup before the other finishes. So we might think wrongly that
we've already reached a consistency state by reading the backup-end
record (written by the first backup) before reading the last required
WAL
file.

/*
* Force a CHECKPOINT. Aside from being necessary to prevent torn
* page problems, this guarantees that two successive backup runs will
* have different checkpoint positions and hence different history
* file names, even if nothing happened in between.
*
* We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
* fast = true). Otherwise this can take awhile.
*/
RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
(fast ? CHECKPOINT_IMMEDIATE : 0));

This problem happens because the above code (in do_pg_start_backup)
actually doesn't ensure that the concurrent backups have the different
checkpoint locations. ISTM that we should change the above or elsewhere
to ensure that.


Yes, good point.


Here's a patch based on that approach, ensuring that each base backup
uses a different checkpoint as the start location. I think I'll commit
this, rather than invent a new unique ID mechanism for backups. The
latter would need changes in recovery and control file too, and I don't
feel like tinkering with that at this stage.


Ok, committed this.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Greg Stark
On Fri, Mar 18, 2011 at 11:55 PM, Josh Berkus j...@agliodbs.com wrote:
 To take the opposite approach... has anyone looked at having the OS just 
 manage all caching for us? Something like MMAPed shared buffers? Even if we 
 find the issue with large shared buffers, we still can't dedicate serious 
 amounts of memory to them because of work_mem issues. Granted, that's 
 something else on the TODO list, but it really seems like we're re-inventing 
 the wheels that the OS has already created here...

A lot of people have talked about it. You can find references to mmap
going at least as far back as 2001 or so. The problem is that it would
depend on the OS implementing things in a certain way and guaranteeing
things we don't think can be portably assumed. We would need to mlock
large amounts of address space which most OS's don't allow, and we
would need to at least mlock and munlock lots of small bits of memory
all over the place which would create lots and lots of mappings which
the kernel and hardware implementations would generally not
appreciate.

 As far as I know, no OS has a more sophisticated approach to eviction
 than LRU.  And clock-sweep is a significant improvement on performance
 over LRU for frequently accessed database objects ... plus our
 optimizations around not overwriting the whole cache for things like VACUUM.

The clock-sweep algorithm was standard OS design before you or I knew
how to type. I would expect any half-decent OS to have sometihng at
least as good -- perhaps better because it can rely on hardware
features to handle things.

However the second point is the crux of the issue and of all similar
issues on where to draw the line between the OS and Postgres. The OS
knows better about the hardware characteristics and can better
optimize the overall system behaviour, but Postgres understands better
its own access patterns and can better optimize its behaviour whereas
the OS is stuck reverse-engineering what Postgres needs, usually from
simple heuristics.


 2-level caches work well for a variety of applications.

I think 2-level caches with simple heuristics like pin all the
indexes is unlikely to be helpful. At least it won't optimize the
average case and I think that's been proven. It might be helpful for
optimizing the worst-case which would reduce the standard deviation.
Perhaps we're at the point now where that matters.

Where it might be helpful is as a more refined version of the
sequential scans use limited set of buffers patch. Instead of having
each sequential scan use a hard coded number of buffers, perhaps all
sequential scans should share a fraction of the global buffer pool
managed separately from the main pool. Though in my thought
experiments I don't see any real win here. In the current scheme if
there's any sign the buffer is useful it gets thrown from the
sequential scan's set of buffers to reuse anyways.

 Now, what would be *really* useful is some way to avoid all the data
 copying we do between shared_buffers and the FS cache.


Well the two options are mmap/mlock or directio. The former might be a
fun experiment but I expect any OS to fall over pretty quickly when
faced with thousands (or millions) of 8kB mappings. The latter would
need Postgres to do async i/o and hopefully a global view of its i/o
access patterns so it could do prefetching in a lot more cases.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Heikki Linnakangas

On 21.03.2011 07:40, Piyush Newe wrote:

On Thu, Mar 17, 2011 at 7:56 PM, Tom Lanet...@sss.pgh.pa.us  wrote:


Robert Haasrobertmh...@gmail.com  writes:

On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
alvhe...@commandprompt.com  wrote:

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges.  Some of it has been fixed,
but a lot of bugs remain.  Looks like this is one of those places and it
seems appropriate to spend some time fixing it.  Since it would involve
a behavior change, it should only go to 9.2, of course.



I wouldn't object to fixing the problem with # of digits  # of Ys in
9.1, if the fix is simple and clear-cut.  I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday.  It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.


Agreed, it's certainly not too late for bug fixes in 9.1.  I agree
that this isn't something we would want to tweak in released branches,
but 9.1 isn't there yet.



I feel the patch for the same would be easier and was attached in the
initial mail of this mail thread. For your ready reference, I am attaching
the same patch here again.



Having said that, it's not entirely clear to me what sane behavior is
here.  Personally I would expect that an n-Ys format spec would consume
at most n digits from the input.  Otherwise how are you going to use
to_date to pick apart strings that don't have any separators?


Yeah, seems reasonable.


 So
I think the problem is actually upstream of the behavior complained of
here.  However, what we should first do is see what Oracle does in such
cases, because the main driving factor for these functions is Oracle
compatibility not what might seem sane in a vacuum.


Following is the extended chart which is comparing the behavior of Oracle,
PG  EDBAS.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, all the cases are in sync except the 1st one. I didn't
understand why Oracle is interpreting year '1' as '2011'.


It makes sense to me. Year 1, when dat format is Y, means the year 
closest to current date that ends with 1. Or maybe the year that ends 
with 1 in the current decade. This is analoguous to how two-digit years 
are interpreted (except that we've hardcoded that the current date to 
compare against is year 2000 - an assumption that will start to bite us 
some time before year 2100).



*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-10',  'DD-MON-Y') Error 01-JAN-2010 Error
TO_DATE('01-jan-10',  'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-') 01-JAN-0010 01-JAN-0010 01-JAN-0010

In this case, it seems in last 3 cases PostgreSQL is behaving correctly.
Oracle is throwing error in 1st case since the Format ('Y') is lesser than
the actual value ('10'). But PostgreSQL is ignoring this case and throwing
whatever is input. The output is might not be the same was user is
expecting.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-067',  'DD-MON-Y') Error 01-JAN-2067 Error
TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
TO_DATE('01-jan-678',  'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
TO_DATE('01-jan-001',  'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, just last case was correct in PG. Rest other cases are not in
sync with Oracle, rather the output is vague.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-2010',  'DD-MON-Y') Error 01-JAN-4010 Error
TO_DATE('01-jan-2010',  'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
TO_DATE('01-jan-2010',  'DD-MON-YYY') Error 01-JAN-3010 Error
TO_DATE('01-jan-2010',  'DD-MON-') 01-JAN-2010 01-JAN-2010 01-JAN-2010

In this case, PG is giving wrong output in first 3 cases. Those need to get
rectified. Oracle is throwing error in 1st and 3rd case and the reason is,
the format is lesser than the actual value. It seems this rule is not
applicable for 2nd case in Oracle.


Yeah, quite inconsistent :-(.

These results are not in favor of the idea that a format with n Ys 
always consumess up to n digits from the input. With that rule, 
to_date('01-jan-2010', 'DD-MON-YY') would return 01-JAN-2020, which 
isn't what Oracle does and seems quite surprising too.


So ignoring the cases where Oracle throws an error but PostgreSQL 
doesn't, there's four cases where the results differ:


 *Data Format Oracle PostgreSQL EDBAS*
 TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
 

[HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Marti Raudsepp
Hi list,

When I have fields with lots of null values, I often create indexes
like this: CREATE INDEX foo_i ON foo(i) WHERE i IS NOT NULL;
This saves me some space, as most indexed queries exclude NULLs anyway.

In PostgreSQL 9.0.3, min(i) can successfully use this index:
---
marti=# create table foo as select null::int as i from
generate_series(1,10);
marti=# create index foo_i_notnull on foo (i) where i is not null;
marti=# analyze foo;
marti=# explain analyze select min(i) from foo;
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.025..0.026
rows=1 loops=1)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.00..0.00 rows=1 width=4) (actual
time=0.021..0.021 rows=0 loops=1)
   -  Index Scan using foo_i_notnull on foo  (cost=0.00..8.27
rows=10 width=4) (actual time=0.019..0.019 rows=0 loops=1)
 Total runtime: 0.063 ms
---

It seems that PostgreSQL 9.1alpha3 cannot, however:
---
marti=# explain analyze select min(i) from foo;
 Aggregate  (cost=1594.00..1594.01 rows=1 width=4) (actual
time=29.612..29.612 rows=1 loops=1)
   -  Seq Scan on foo  (cost=0.00..1344.00 rows=10 width=4)
(actual time=0.023..14.221 rows=10 loops=1)
 Total runtime: 29.661 ms
---

It would be cool to have this feature re-added before a 9.1 release.

I know that the Merge Append patch required some changes in the
min/max optimization, which is probably the cause.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to look at the Expression Trees

2011-03-21 Thread Vaibhav Kaushal
Hi all,

I have been trying to work on the expression evaluator (trying to alter
it just for the seqscan case). I have understood a few things. I wish
someone could tell me if I am wrong at some point. As far as I have gone
through the code, I think:

1. Quals are formulated by planner
2. Quals are actually a list of Expression Trees.
3. They are created in the ExecInitExpr function.
4. Every row / tuple table slot is run through the same qual list and
thus goes through the same expression tree execution path as the other
(of course being filtered in between if they do not fit the qual in the
list). 

5. The most common nodes in the expression trees are the ExecEvalVar,
ExecEvalConst and ExecEvalParam.

I might be wrong somewhere (especially most people would be able to say
a lot about the 5th point). But if the above were to be correct then how
and why are the ExecMakeFunctionResultNoSets, ExecEvalRelabelType,
ExecEvalFuncArgs and the likes are used?

I wanted to see how the expression tree gets into form before it gets
into the ExecQual for parse by ExecEvalExpr function. Is there a way to
see the Expression Tree so that I get a better idea about what is
happening?

Regards,
Vaibhav 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to look at the Expression Trees

2011-03-21 Thread Heikki Linnakangas

On 21.03.2011 13:44, Vaibhav Kaushal wrote:

Hi all,

I have been trying to work on the expression evaluator (trying to alter
it just for the seqscan case). I have understood a few things. I wish
someone could tell me if I am wrong at some point. As far as I have gone
through the code, I think:

1. Quals are formulated by planner
2. Quals are actually a list of Expression Trees.
3. They are created in the ExecInitExpr function.
4. Every row / tuple table slot is run through the same qual list and
thus goes through the same expression tree execution path as the other
(of course being filtered in between if they do not fit the qual in the
list).


Yes. There's actually two trees involved. The planner produces a tree 
of Expr nodes, and ExecInitExpr prepares a tree of ExprState nodes that 
mirrors the first tree. The ExprStates contain run-time state needed to 
execute the Expr tree.



5. The most common nodes in the expression trees are the ExecEvalVar,
ExecEvalConst and ExecEvalParam.


Well, that's obviously going to depend on the query.


I might be wrong somewhere (especially most people would be able to say
a lot about the 5th point). But if the above were to be correct then how
and why are the ExecMakeFunctionResultNoSets, ExecEvalRelabelType,
ExecEvalFuncArgs and the likes are used?


ExecMakeFunctionResultNoSets is used to evaluate function calls. The 
first function call invocation always uses ExecMakeFunctionResult, but 
if ExecMakeFunctionResult sees on that first invocation that it was not 
a set-returning-function, it changes the evaluator function for 
subsequent invocations to ExecMakeFunctionResultNoSets. 
ExecMakeFunctionResultNoSets does the same thing as 
ExecMakeFunctionResult, but skips the checks for set-returning 
functions, making the evaluation a bit faster.


ExecEvalFuncArgs is used by ExecMakeFunctionResult to evaluate the 
function arguments.


ExecEvalRelabelType is used to evaluate RelabelType nodes. RelabelType 
doesn't really do anything, it's just a placeholder when a type is cast 
to another, and the source and target types are binary compatible.



I wanted to see how the expression tree gets into form before it gets
into the ExecQual for parse by ExecEvalExpr function. Is there a way to
see the Expression Tree so that I get a better idea about what is
happening?


set debug_print_plan=on

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to look at the Expression Trees

2011-03-21 Thread Vaibhav Kaushal
On Mon, Mar 21, 2011 at 5:47 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:


 Yes. There's actually two trees involved. The planner produces a tree of
 Expr nodes, and ExecInitExpr prepares a tree of ExprState nodes that mirrors
 the first tree. The ExprStates contain run-time state needed to execute the
 Expr tree.


 Yes I am seeing that in the code.

  5. The most common nodes in the expression trees are the ExecEvalVar,
 ExecEvalConst and ExecEvalParam.


 Well, that's obviously going to depend on the query.

 Indeed yes, it does, but still for most cases the ExecEvalVar or
ExecEvalScalarVar is seen a lot of times (I am trying only simple queries
which take most data from disk) which is why i said that. Anyways you are
right there.



  I might be wrong somewhere (especially most people would be able to say
 a lot about the 5th point). But if the above were to be correct then how
 and why are the ExecMakeFunctionResultNoSets, ExecEvalRelabelType,
 ExecEvalFuncArgs and the likes are used?


 ExecMakeFunctionResultNoSets is used to evaluate function calls. The first
 function call invocation always uses ExecMakeFunctionResult, but if
 ExecMakeFunctionResult sees on that first invocation that it was not a
 set-returning-function, it changes the evaluator function for subsequent
 invocations to ExecMakeFunctionResultNoSets. ExecMakeFunctionResultNoSets
 does the same thing as ExecMakeFunctionResult, but skips the checks for
 set-returning functions, making the evaluation a bit faster.

 ExecEvalFuncArgs is used by ExecMakeFunctionResult to evaluate the function
 arguments.

 ExecEvalRelabelType is used to evaluate RelabelType nodes. RelabelType
 doesn't really do anything, it's just a placeholder when a type is cast to
 another, and the source and target types are binary compatible.

 did not know the last fact.


  I wanted to see how the expression tree gets into form before it gets
 into the ExecQual for parse by ExecEvalExpr function. Is there a way to
 see the Expression Tree so that I get a better idea about what is
 happening?


 set debug_print_plan=on

 I am already using the postgresql server with -d 4 option and it shows a
lot of things. But I am not able to see the Expression State trees. OK. I
know that the output DOES show the 'expr' entries. But if those are what
make the Expression _tree_ then I am not able to understand them. A little
help on that would be generous.


 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


Regards,
Vaibhav


Re: [HACKERS] How to look at the Expression Trees

2011-03-21 Thread Heikki Linnakangas

On 21.03.2011 14:37, Vaibhav Kaushal wrote:

I am already using the postgresql server with -d 4 option and it shows a
lot of things. But I am not able to see the Expression State trees.


To be precise, debug_print_plan=on prints the expression tree that comes 
from planner, not the execution tree of ExprStates. But the structure of 
the ExprState tree is the same as the planner tree, you just have 
ExprState nodes in place of Expr nodes.



OK. I
know that the output DOES show the 'expr' entries. But if those are what
make the Expression _tree_ then I am not able to understand them. A little
help on that would be generous.


It is indeed a tree that gets printed with debug_print_plan. There's 
more than the expression tree there, with information about planner the 
relations involved etc. Look for targetlist and qual fields in the 
output, that's where the expression trees are.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 7:17 AM, Marti Raudsepp ma...@juffo.org wrote:
 Hi list,

 When I have fields with lots of null values, I often create indexes
 like this: CREATE INDEX foo_i ON foo(i) WHERE i IS NOT NULL;
 This saves me some space, as most indexed queries exclude NULLs anyway.

 In PostgreSQL 9.0.3, min(i) can successfully use this index:
 ---
 marti=# create table foo as select null::int as i from
 generate_series(1,10);
 marti=# create index foo_i_notnull on foo (i) where i is not null;
 marti=# analyze foo;
 marti=# explain analyze select min(i) from foo;
  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.025..0.026
 rows=1 loops=1)
   InitPlan 1 (returns $0)
     -  Limit  (cost=0.00..0.00 rows=1 width=4) (actual
 time=0.021..0.021 rows=0 loops=1)
           -  Index Scan using foo_i_notnull on foo  (cost=0.00..8.27
 rows=10 width=4) (actual time=0.019..0.019 rows=0 loops=1)
  Total runtime: 0.063 ms
 ---

 It seems that PostgreSQL 9.1alpha3 cannot, however:
 ---
 marti=# explain analyze select min(i) from foo;
  Aggregate  (cost=1594.00..1594.01 rows=1 width=4) (actual
 time=29.612..29.612 rows=1 loops=1)
   -  Seq Scan on foo  (cost=0.00..1344.00 rows=10 width=4)
 (actual time=0.023..14.221 rows=10 loops=1)
  Total runtime: 29.661 ms
 ---

 It would be cool to have this feature re-added before a 9.1 release.

 I know that the Merge Append patch required some changes in the
 min/max optimization, which is probably the cause.

Yeah, I think this is a direct result of commit
034967bdcbb0c7be61d0500955226e1234ec5f04.

I was kind of nervous about that one when it went in, and the fact
that we're getting our first complaint about it before we've even hit
beta is not setting my mind at ease...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Having said that, it's not entirely clear to me what sane behavior is
 here.  Personally I would expect that an n-Ys format spec would consume
 at most n digits from the input.  Otherwise how are you going to use
 to_date to pick apart strings that don't have any separators?

 Yeah, seems reasonable.

On the flip side, what if you want to allow either a two digit year or
a four digit year?  It doesn't seem unreasonable to allow YY to
emcompass what  would have allowed, unless there's a separate
notion for 'either YY or '.

 It makes sense to me. Year 1, when dat format is Y, means the year
 closest to current date that ends with 1. Or maybe the year that ends with 1
 in the current decade. This is analoguous to how two-digit years are
 interpreted (except that we've hardcoded that the current date to compare
 against is year 2000 - an assumption that will start to bite us some time
 before year 2100).

Agree with all of this.

 So ignoring the cases where Oracle throws an error but PostgreSQL doesn't,
 there's four cases where the results differ:

 *Data Format Oracle PostgreSQL EDBAS*
 TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
 TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
 TO_DATE('01-jan-678',  'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
 TO_DATE('01-jan-2010',  'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error

 IMHO our current behavior in 2nd and 4th case is so bizarre that we should
 change them to match Oracle. I think we should fix the 1st too, the notion
 that a single-digit year means something between 2000-2009 seems pretty
 useless (granted, using a single digit for year is brain-dead to begin
 with).

I agree, but do we understand what Oracle does categorically, rather
than just its output on this specific input?

 The 3rd one is debatable. The range for three-digit years is currently
 1100-2099, which is enough range for many applications. But should we change
 it for the sake of matching Oracle's behavior? Not that anyone uses YYY in
 practice, but still.

I'm OK with that, but again, exactly what rule is Oracle applying here?

 BTW, whatever behavior we choose, this needs to be documented. I don't see
 anything in the docs on how Y, YY or YYY are expanded.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Having said that, it's not entirely clear to me what sane behavior is
 here.  Personally I would expect that an n-Ys format spec would consume
 at most n digits from the input.  Otherwise how are you going to use
 to_date to pick apart strings that don't have any separators?

 Yeah, seems reasonable.

 On the flip side, what if you want to allow either a two digit year or
 a four digit year?  It doesn't seem unreasonable to allow YY to
 emcompass what  would have allowed, unless there's a separate
 notion for 'either YY or '.

What I was thinking was that  would take either 2 or 4 digits.
Whatever you do here, the year will have to be delimited by a non-digit
for such cases to be parseable.

 I'm OK with that, but again, exactly what rule is Oracle applying here?

Yeah.  Hopefully they documented it, and we don't have to try to
reverse-engineer the intention from an undersized set of samples.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Having said that, it's not entirely clear to me what sane behavior is
 here.  Personally I would expect that an n-Ys format spec would consume
 at most n digits from the input.  Otherwise how are you going to use
 to_date to pick apart strings that don't have any separators?

 Yeah, seems reasonable.

 On the flip side, what if you want to allow either a two digit year or
 a four digit year?  It doesn't seem unreasonable to allow YY to
 emcompass what  would have allowed, unless there's a separate
 notion for 'either YY or '.

 What I was thinking was that  would take either 2 or 4 digits.
 Whatever you do here, the year will have to be delimited by a non-digit
 for such cases to be parseable.

I was assuming a slightly more general variant of that - namely, Y,
YY, or YYY would all accept that many digits, or more; and the result
of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or ,
respectively, had been used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 What I was thinking was that  would take either 2 or 4 digits.
 Whatever you do here, the year will have to be delimited by a non-digit
 for such cases to be parseable.

 I was assuming a slightly more general variant of that - namely, Y,
 YY, or YYY would all accept that many digits, or more; and the result
 of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or ,
 respectively, had been used.

As far as I can see, that would completely destroy the use-case of
trying to parse a string where there's not non-digit delimiters and
so you have to take exactly the specified number of digits, not more.

Why not head in the other direction of allowing fewer digits than
suggested by the format, instead of more?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 7:17 AM, Marti Raudsepp ma...@juffo.org wrote:
 I know that the Merge Append patch required some changes in the
 min/max optimization, which is probably the cause.

 Yeah, I think this is a direct result of commit
 034967bdcbb0c7be61d0500955226e1234ec5f04.

Yeah, looks that way.  I'm not sure what it would take to re-support
this case without losing the other advantages of the change.  Personally
I'm not terribly excited about it: I don't think that suppressing nulls
from an index this way is really very useful.  Using a partial index
probably eats more planner cycles than you'll save, overall.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] When and where do PG invoke PLs module?

2011-03-21 Thread _石头
Hello,
 Thank you for reading my question!


 Lately,I‘am reading PostgreSQL’s source code! As I understand,the PLs 
module(such as pl/pgSQL)allow user-defined functions to be writter in other 
languages.
 AS postgresql-9.0-document.pdf says :For a function written in a 
procedurallanguage, the database server has no built-in knowledge about how to 
interpret the function’s source text. Instead, the task is passed to a special 
handler that knows the details of the language.The handler could either do all 
the work of parsing, syntax analysis, execution, etc. itself, or it could serve 
as “glue” between PostgreSQL and an existing implementation of a programming 
language.
 I've tried to find when and where do PG invoke PLs module,but failed.There 
are four procedures for a query string--parer, rewrite,plan and execute. I want 
to know which part invoke the PLs module,and which function is the entry to do 
that.

 Looking forward for your reply!
 

  




   from 
  stone.

Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread rsmogura

On Mon, 21 Mar 2011 10:24:22 +, Greg Stark wrote:
On Fri, Mar 18, 2011 at 11:55 PM, Josh Berkus j...@agliodbs.com 
wrote:
To take the opposite approach... has anyone looked at having the OS 
just manage all caching for us? Something like MMAPed shared buffers? 
Even if we find the issue with large shared buffers, we still can't 
dedicate serious amounts of memory to them because of work_mem 
issues. Granted, that's something else on the TODO list, but it 
really seems like we're re-inventing the wheels that the OS has 
already created here...


A lot of people have talked about it. You can find references to mmap
going at least as far back as 2001 or so. The problem is that it 
would
depend on the OS implementing things in a certain way and 
guaranteeing

things we don't think can be portably assumed. We would need to mlock
large amounts of address space which most OS's don't allow, and we
would need to at least mlock and munlock lots of small bits of memory
all over the place which would create lots and lots of mappings which
the kernel and hardware implementations would generally not
appreciate.
Actually, just from curious, I done test with mmap, and I got 2% boost 
on data reading, maybe because of skipping memcpy in fread. I really 
curious how fast, if even, it will be if I add some good and needed 
stuff and how e.g. vacuum will work.


snip


2-level caches work well for a variety of applications.


I think 2-level caches with simple heuristics like pin all the
indexes is unlikely to be helpful. At least it won't optimize the
average case and I think that's been proven. It might be helpful for
optimizing the worst-case which would reduce the standard deviation.
Perhaps we're at the point now where that matters.

Actually, 2nd level caches do not pin index buffer. It's just, in 
simple words, some set of reserved buffers' ids to be used for index 
pages, all logic with pining, etc. it's same, the difference is that 
default level operation will not touch 2nd level. I post some reports 
from my simple tests. When I was experimenting with 2nd level caches I 
saw that some operations may swap out system tables buffers, too.


snip

Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 7:17 AM, Marti Raudsepp ma...@juffo.org wrote:
 I know that the Merge Append patch required some changes in the
 min/max optimization, which is probably the cause.

 Yeah, I think this is a direct result of commit
 034967bdcbb0c7be61d0500955226e1234ec5f04.

 Yeah, looks that way.  I'm not sure what it would take to re-support
 this case without losing the other advantages of the change.  Personally
 I'm not terribly excited about it: I don't think that suppressing nulls
 from an index this way is really very useful.  Using a partial index
 probably eats more planner cycles than you'll save, overall.

If only 1% of the table has non-NULL values in that column, maybe not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 What I was thinking was that  would take either 2 or 4 digits.
 Whatever you do here, the year will have to be delimited by a non-digit
 for such cases to be parseable.

 I was assuming a slightly more general variant of that - namely, Y,
 YY, or YYY would all accept that many digits, or more; and the result
 of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or ,
 respectively, had been used.

 As far as I can see, that would completely destroy the use-case of
 trying to parse a string where there's not non-digit delimiters and
 so you have to take exactly the specified number of digits, not more.

Yeah, I thought about that, but it seems that use case is already
hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
worry about it.  If Piyush's table is to be believed, Oracle only
throws an error for the wrong number of digits if the format is Y or
YYY, and the actual number of digits is more.  If the format is YY,
then it accepts 2, 3, or 4 digit years.  And since YY is exponentially
more likely to be used than Y or YYY, that pretty much means you can't
do what you're talking about using this syntax anyway.

 Why not head in the other direction of allowing fewer digits than
 suggested by the format, instead of more?

Well, that seems a bit counterintuitive to me.  I think it's much more
likely that someone wants to insist on a four-digit year (and not
allow just two digits) than that they want to insist on a two-digit
year (and not allow four digits).  I also think that would be pretty
terrible for Oracle compatibility, since they're clearly interpreting
99 vs.  as meaning either 0099, not 1999 or 2099.  I don't think
we want to be randomly incompatible there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Merlin Moncure
On Mon, Mar 21, 2011 at 5:24 AM, Greg Stark gsst...@mit.edu wrote:
 On Fri, Mar 18, 2011 at 11:55 PM, Josh Berkus j...@agliodbs.com wrote:
 To take the opposite approach... has anyone looked at having the OS just 
 manage all caching for us? Something like MMAPed shared buffers? Even if we 
 find the issue with large shared buffers, we still can't dedicate serious 
 amounts of memory to them because of work_mem issues. Granted, that's 
 something else on the TODO list, but it really seems like we're 
 re-inventing the wheels that the OS has already created here...

 A lot of people have talked about it. You can find references to mmap
 going at least as far back as 2001 or so. The problem is that it would
 depend on the OS implementing things in a certain way and guaranteeing
 things we don't think can be portably assumed. We would need to mlock
 large amounts of address space which most OS's don't allow, and we
 would need to at least mlock and munlock lots of small bits of memory
 all over the place which would create lots and lots of mappings which
 the kernel and hardware implementations would generally not
 appreciate.

 As far as I know, no OS has a more sophisticated approach to eviction
 than LRU.  And clock-sweep is a significant improvement on performance
 over LRU for frequently accessed database objects ... plus our
 optimizations around not overwriting the whole cache for things like VACUUM.

 The clock-sweep algorithm was standard OS design before you or I knew
 how to type. I would expect any half-decent OS to have sometihng at
 least as good -- perhaps better because it can rely on hardware
 features to handle things.

 However the second point is the crux of the issue and of all similar
 issues on where to draw the line between the OS and Postgres. The OS
 knows better about the hardware characteristics and can better
 optimize the overall system behaviour, but Postgres understands better
 its own access patterns and can better optimize its behaviour whereas
 the OS is stuck reverse-engineering what Postgres needs, usually from
 simple heuristics.


 2-level caches work well for a variety of applications.

 I think 2-level caches with simple heuristics like pin all the
 indexes is unlikely to be helpful. At least it won't optimize the
 average case and I think that's been proven. It might be helpful for
 optimizing the worst-case which would reduce the standard deviation.
 Perhaps we're at the point now where that matters.

 Where it might be helpful is as a more refined version of the
 sequential scans use limited set of buffers patch. Instead of having
 each sequential scan use a hard coded number of buffers, perhaps all
 sequential scans should share a fraction of the global buffer pool
 managed separately from the main pool. Though in my thought
 experiments I don't see any real win here. In the current scheme if
 there's any sign the buffer is useful it gets thrown from the
 sequential scan's set of buffers to reuse anyways.

 Now, what would be *really* useful is some way to avoid all the data
 copying we do between shared_buffers and the FS cache.


 Well the two options are mmap/mlock or directio. The former might be a
 fun experiment but I expect any OS to fall over pretty quickly when
 faced with thousands (or millions) of 8kB mappings. The latter would
 need Postgres to do async i/o and hopefully a global view of its i/o
 access patterns so it could do prefetching in a lot more cases.

Can't you make just one large mapping and lock it in 8k regions? I
thought the problem with mmap was not being able to detect other
processes 
(http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.html)
compatibility issues (possibly obsolete), etc.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Heikki Linnakangas

On 21.03.2011 17:54, Merlin Moncure wrote:

Can't you make just one large mapping and lock it in 8k regions? I
thought the problem with mmap was not being able to detect other
processes 
(http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.html)
compatibility issues (possibly obsolete), etc.


That mail is about replacing SysV shared memory with mmap(). Detecting 
other processes is a problem in that use, but that's not an issue with 
using mmap() to replace shared buffers.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Kevin Grittner
Robert Haas  wrote:
 Tom Lane  wrote:
 
 I don't think that suppressing nulls from an index this way is
 really very useful. Using a partial index probably eats more
 planner cycles than you'll save, overall.
 
 If only 1% of the table has non-NULL values in that column, maybe
 not.
 
We definitely have indexes with less than 1% non-NULL, and we've
found partial indexes to be efficient for them.  On the other hand,
I can't think where we do min/max on any of them; so as long as this
regression only affects those aggregates, it won't hurt our shop.
 
The use case doesn't seem all that far-fetched to me, though.
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-21 Thread Yeb Havinga

On 2011-03-21 02:05, Robert Haas wrote:

On Sun, Mar 20, 2011 at 11:03 AM, Yeb Havingayebhavi...@gmail.com  wrote:

On 2011-03-20 05:44, Robert Haas wrote:

Hmm, I'm not going to be able to reproduce this here, and my test
setup didn't show a clear regression.  I can try beating on it some
more, but...  Any chance you could rerun your test with the latest
master-branch code, and perhaps also with the patch I proposed
upthread to remove a branch from the section protection by
SyncRepLock?  I can't really tell from reading the emails you linked
what was responsible for the slowdowns and speedups, and it is unclear
to me how much impact my recent changes actually had.

No problem. Could you tell me the name of the remove a branch from the
section protection by SyncRepLock ? patch, or perhaps a message-link?
Upthread I see sync-standbys-defined-rearrangement.patch but also two
sync-rep-wait-fixes.

Thanks!  The things I'd like to see compared are:

pgbench -i -s 50 test
Two runs of pgbench -c 10 -M prepared -T 600 test with 1 sync standby 
- server configs etc were mailed upthread.



- performance as of commit e148443ddd95cd29edf4cc1de6188eb9cee029c5

1158 and 1306 (avg 1232)

- performance as of current git master

1181 and 1280 (avg 1230,5)

- performance as of current git master with
sync-standbys-defined-rearrangement applied

1152 and 1269 (avg 1210,5)


--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 As far as I can see, that would completely destroy the use-case of
 trying to parse a string where there's not non-digit delimiters and
 so you have to take exactly the specified number of digits, not more.

 Yeah, I thought about that, but it seems that use case is already
 hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
 worry about it.

How so?

regression=# select to_date('20110321', 'MMDD');
  to_date   

 2011-03-21
(1 row)

regression=# select to_date('110321', 'YYMMDD');
  to_date   

 2011-03-21
(1 row)

If you break the latter case, I am sure the villagers will be on your
doorstep shortly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Josh Berkus
On 3/21/11 3:24 AM, Greg Stark wrote:
 2-level caches work well for a variety of applications.
 
 I think 2-level caches with simple heuristics like pin all the
 indexes is unlikely to be helpful. At least it won't optimize the
 average case and I think that's been proven. It might be helpful for
 optimizing the worst-case which would reduce the standard deviation.
 Perhaps we're at the point now where that matters.

You're missing my point ... Postgres already *has* a 2-level cache:
shared_buffers and the FS cache.  Anything we add to that will be adding
levels.

We already did that, actually, when we implemented ARC: effectively gave
PostgreSQL a 3-level cache.  The results were not very good, although
the algorithm could be at fault there.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 As far as I can see, that would completely destroy the use-case of
 trying to parse a string where there's not non-digit delimiters and
 so you have to take exactly the specified number of digits, not more.

 Yeah, I thought about that, but it seems that use case is already
 hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
 worry about it.

 How so?

 regression=# select to_date('20110321', 'MMDD');
  to_date
 
  2011-03-21
 (1 row)

 regression=# select to_date('110321', 'YYMMDD');
  to_date
 
  2011-03-21
 (1 row)

 If you break the latter case, I am sure the villagers will be on your
 doorstep shortly.

Oh, dear.  No wonder this code is so hard to get right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Missing semicolon in parser's gram.y

2011-03-21 Thread Gurjeet Singh
Attached is a single-line patch to add a missing semicolon to gram.y; caught
using yyextract.

Regards,
-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d32e480..44c4fd6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -9156,6 +9156,7 @@ SimpleTypename:
 $$ = $1;
 $$-collnames = $2;
 			}
+	;
 
 SimpleTypenameWithoutCollation:
 			GenericType{ $$ = $1; }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Robert Haas  wrote:
 Tom Lane  wrote:
 I don't think that suppressing nulls from an index this way is
 really very useful. Using a partial index probably eats more
 planner cycles than you'll save, overall.

 If only 1% of the table has non-NULL values in that column, maybe
 not.
 
 We definitely have indexes with less than 1% non-NULL, and we've
 found partial indexes to be efficient for them.  On the other hand,
 I can't think where we do min/max on any of them; so as long as this
 regression only affects those aggregates, it won't hurt our shop.
 
 The use case doesn't seem all that far-fetched to me, though.

Hmm.  We could possibly fix this by having planagg.c do a completely
separate planner run for each aggregate, wherein it actually does build
the equivalent query
SELECT col FROM tab WHERE existing-quals AND col IS NOT NULL
ORDER BY col ASC/DESC LIMIT 1
and plan that.  That'd be less efficient than the current way,
especially for cases where there are multiple aggregates, because there
would be some duplication of processing between the per-aggregate
planner runs and the main one.  But since we can only do this
optimization for rather simple queries anyway, maybe it wouldn't matter
much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 12:29 PM, Yeb Havinga yebhavi...@gmail.com wrote:
 pgbench -i -s 50 test
 Two runs of pgbench -c 10 -M prepared -T 600 test with 1 sync standby -
 server configs etc were mailed upthread.

 - performance as of commit e148443ddd95cd29edf4cc1de6188eb9cee029c5

 1158 and 1306 (avg 1232)

 - performance as of current git master

 1181 and 1280 (avg 1230,5)

 - performance as of current git master with
 sync-standbys-defined-rearrangement applied

 1152 and 1269 (avg 1210,5)

Hmm, that doesn't appear to show the 20% regression Simon claimed
upthread.  That's good...  but I'm confused as to how you are getting
numbers this high at all without a BBU.  If every commit has to wait
for two consecutive fsyncs, cranking out 1200+ commits per second is a
lot.  Maybe it's just barely plausible if these are 15K drives and all
the commits are piggybacking on the fsyncs at top speed, but, man,
that's fast.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Greg Stark
So it's a clever hack that we used to allow the partial indexes to be
used. It relied on the implicit assumption that min(x) and max(x)
where the only values of x where NULL were both NULL.

It would be nice if we were clever enough to support *any* strict
aggregate using partial indexes on WHERE NOT NULL since they'll all
have that property.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 So it's a clever hack that we used to allow the partial indexes to be
 used. It relied on the implicit assumption that min(x) and max(x)
 where the only values of x where NULL were both NULL.

 It would be nice if we were clever enough to support *any* strict
 aggregate using partial indexes on WHERE NOT NULL since they'll all
 have that property.

Huh?  The point of the min/max optimization is to not scan the whole
index but just fetch the endpoint value.  For general aggregates, you
have to scan the table anyway.  If an index is useful for that, it'll
get picked up in the normal planning process.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql \dt and table size

2011-03-21 Thread Bernd Helmle
It stroke me today again, that \dt+ isn't displaying the acurate table size 
for tables, since it uses pg_relation_size() till now. With having 
pg_table_size() since PostgreSQL 9.0 available, i believe it would be more 
useful to have the total acquired storage displayed, including implicit 
objects (the mentioned case where it was not very useful atm was a table 
with a big TOAST table).


Attached minor patch extends \dt to use pg_table_size() starting with 
PostgreSQL 9.0, not sure if we backport such changes though. It would be 
interesting for 9.1, however.


--
Thanks

Bernd

psql_tablesize.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing semicolon in parser's gram.y

2011-03-21 Thread Tom Lane
Gurjeet Singh singh.gurj...@gmail.com writes:
 Attached is a single-line patch to add a missing semicolon to gram.y; caught
 using yyextract.

Done some time ago ...
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a051ef699c3ed1f89088dd6bbc2574f13d0b20eb#patch16

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-21 Thread Yeb Havinga

On 2011-03-21 18:04, Robert Haas wrote:

On Mon, Mar 21, 2011 at 12:29 PM, Yeb Havingayebhavi...@gmail.com  wrote:

pgbench -i -s 50 test
Two runs of pgbench -c 10 -M prepared -T 600 test with 1 sync standby -
server configs etc were mailed upthread.


- performance as of commit e148443ddd95cd29edf4cc1de6188eb9cee029c5

1158 and 1306 (avg 1232)

- performance as of current git master

1181 and 1280 (avg 1230,5)

- performance as of current git master with
sync-standbys-defined-rearrangement applied

1152 and 1269 (avg 1210,5)


I ran another pgbench with this last setup, which gives it a 1240,33 
average:

tps = 1300.786386 (including connections establishing)
tps = 1300.844220 (excluding connections establishing)

IMO what these tests have shown is that there is no 20% performance 
difference between the different versions. To determine if there are 
differences, n should be a lot higher, or perhaps a single one with a 
very large duration.



Hmm, that doesn't appear to show the 20% regression Simon claimed
upthread.  That's good...  but I'm confused as to how you are getting
numbers this high at all without a BBU.


For the sake of testing syncrep, I put xfs in nobarrier mode on both 
master and standby:


/dev/sdc1 on /xlog type xfs (rw,noatime,nodiratime,nobarrier)
/dev/md11 on /archive type xfs 
(rw,noatime,nodiratime,nobarrier,logdev=/dev/sdc3)
/dev/md10 on /data type xfs 
(rw,noatime,nodiratime,nobarrier,logdev=/dev/sdc2)


--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Greg Stark
On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Can't you make just one large mapping and lock it in 8k regions? I
 thought the problem with mmap was not being able to detect other
 processes 
 (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.html)
 compatibility issues (possibly obsolete), etc.

I was assuming that locking part of a mapping would force the kernel
to split the mapping. It has to record the locked state somewhere so
it needs a data structure that represents the size of the locked
section and that would, I assume, be the mapping.

It's possible the kernel would not in fact fall over too badly doing
this. At some point I'll go ahead and do experiments on it. It's a bit
fraught though as it the performance may depend on the memory
management features of the chipset.

That said, that's only part of the battle. On 32bit you can't map the
whole database as your database could easily be larger than your
address space. I have some ideas on how to tackle that but the
simplest test would be to just mmap 8kB chunks everywhere.

But it's worse than that. Since you're not responsible for flushing
blocks to disk any longer you need some way to *unlock* a block when
it's possible to be flushed. That means when you flush the xlog you
have to somehow find all the blocks that might no longer need to be
locked and atomically unlock them. That would require new
infrastructure we don't have though it might not be too hard.

What would be nice is a mlock_until() where you eventually issue a
call to tell the kernel what point in time you've reached and it
unlocks everything older than that time.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Greg Stark
On Mon, Mar 21, 2011 at 4:47 PM, Josh Berkus j...@agliodbs.com wrote:
 You're missing my point ... Postgres already *has* a 2-level cache:
 shared_buffers and the FS cache.  Anything we add to that will be adding
 levels.

I don't think those two levels are interesting -- they don't interact
cleverly at all.

I was assuming the two levels were segments of the shared buffers that
didn't interoperate at all. If you kick buffers from the higher level
cache into the lower level one then why not just increase the number
of clock sweeps before you flush a buffer and insert non-index pages
into a lower clock level instead of writing code for two levels?

I don't think it will outperform in general because LRU is provably
within some margin from optimal and the clock sweep is an approximate
LRU. The only place you're going to find wins is when you know
something extra about the *future* access pattern that the lru/clock
doesn't know based on the past behaviour. Just saying indexes are
heavily used or system tables are heavily used isn't really extra
information since the LRU can figure that out. Something like
sequential scans of tables larger than shared buffers don't go back
and read old pages before they age out is.

The other place you might win is if you have some queries that you
want to always be fast at the expense of slower queries. So your short
web queries that only need to touch a few small tables and system
tables can tag buffers that are higher priority and shouldn't be
swapped out to achieve a slightly higher hit rate on the global cache.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Greg Stark
On Mon, Mar 21, 2011 at 5:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark gsst...@mit.edu writes:
 So it's a clever hack that we used to allow the partial indexes to be
 used. It relied on the implicit assumption that min(x) and max(x)
 where the only values of x where NULL were both NULL.

 It would be nice if we were clever enough to support *any* strict
 aggregate using partial indexes on WHERE NOT NULL since they'll all
 have that property.

 Huh?  The point of the min/max optimization is to not scan the whole
 index but just fetch the endpoint value.

But in the case where the index has no records it doesn't know whether
there were no records in the table or they were just all NULL. As it
happens min() and max() return NULL in both cases so it doesn't
matter. My point was that this is a clever hack and a non-obvious
deduction the planner is making.


 For general aggregates, you
 have to scan the table anyway.  If an index is useful for that, it'll
 get picked up in the normal planning process.

if I do SELECT count(col) from tab with no WHERE clauses on a table
with 1% non-null values in col will the planner correctly find the
partial index? If so why doesn't the min/max planning find it?



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tolower() identifier downcasing versus multibyte encodings

2011-03-21 Thread Francisco Figueiredo Jr.
I just received a feedback from our bug report about this problem and
it seems the problem also occurred on a windows machine.

http://pgfoundry.org/tracker/index.php?func=detailaid=1010988group_id=1000140atid=590



On Sat, Mar 19, 2011 at 14:13, Marko Kreen mark...@gmail.com wrote:
 On Sat, Mar 19, 2011 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 On Sat, Mar 19, 2011 at 6:10 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Or we could bite the bullet and start using str_tolower(), but the
 performance implications of that are unpleasant; not to mention that
 we really don't want to re-introduce the Turkish problem with
 unexpected handling of i/I in identifiers.

 How about first pass with 'a' - 'A' and if highbit is found
 then str_tolower()?

 Hm, maybe.

 There's still the problem of what to do in src/port/pgstrcasecmp.c,
 which won't have the infrastructure needed to do that.

 You mean client-side?  Could we have a str_tolower without xxx_l
 branch that always does wide-char conversion if high-bit is set?

 Custom locale there won't make sense there anyway?

 --
 marko




-- 
Regards,

Francisco Figueiredo Jr.
Npgsql Lead Developer
http://www.npgsql.org
http://fxjr.blogspot.com
http://twitter.com/franciscojunior

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Chinese initdb on Windows

2011-03-21 Thread Heikki Linnakangas
On windows, if you have OS locale set to Chinese (Simplified, PRC), 
initdb fails:


X:\C:\pgsql-install\bin\initdb.exe -D data2
The files belonging to this database system will be owned by user Heikki.
This user must also own the server process.

The database cluster will be initialized with locale Chinese 
(Simplified)_People

's Republic of China.936.
initdb: locale Chinese (Simplified)_People's Republic of China.936 
requires unsu

pported encoding GBK
Encoding GBK is not allowed as a server-side encoding.
Rerun initdb with a different locale selection.

The easy workaround for that is to specify --encoding=UTF-8, as UTF-8 
can be used with any locale on Windows. How about doing that 
automatically in initdb? Now that we have the smarts in psql to detect 
current encoding from the environment and set client_encoding 
accordingly, it Just Works. Attached is a patch for that.



Once you get past that, however, there's another issue:

 ...

creating directory data2 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in data2/base/1 ... ok
initializing pg_authid ... FATAL:  database locale is incompatible with operatin
g system
DETAIL:  The database was initialized with LC_COLLATE Chinese (Simplified)_Peoples 
Republic of China.936,  which is not recognized by setlocale().
HINT:  Recreate the database with another locale or install the missing locale.
child process exited with exit code 1


The problem is probably the apostrophe in the locale name, although it 
seems to be missing from the above error message. setlocale() has a 
known problem with locale names that have dots in the country name, and 
looks like it has similar issues with apostrophes.


Fortunately, there are aliases for those problematic locales on Windows, 
that don't have dots or apostrophes in the names. We did some testing in 
EnterpriseDB of various locales on various versions of Windows, and came 
up with the following mappings:


*_Hong Kong S.A.R.* - *_HKG.*
*_U.A.E.* - *_ARE.*
*_People's Republic of China.* - *_China.*
China_Macau S.A.R..950 - ZHM

The first three mappings map the full country name to an abbreviation 
that is also accepted by Windows' setlocale(). See 
http://msdn.microsoft.com/en-us/library/cdax410z%28v=vs.71%29.aspx. ARE 
is not on that list, but seems to work.


Macau is trickier. ZHM is not an abbreviation of the country, but of the 
whole locale, so we can't replace just the country part. So this will 
not work for Finnish_Macau S.A.R..950, like the other mappings do. 
Nevertheless, it works for the common case.


Any objections to the 2nd attached patch, which adds the mapping of 
those locale names on Windows?


I'm thinking it's not too late to do this in 9.1.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 98e864d..6999a42 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2909,7 +2909,19 @@ main(int argc, char *argv[])
 		}
 		else if (!pg_valid_server_encoding_id(ctype_enc))
 		{
-			/* We recognized it, but it's not a legal server encoding */
+			/*
+			 * We recognized it, but it's not a legal server encoding.
+			 * On Windows, UTF-8 works with any locale, so we can fall back
+			 * to UTF-8.
+			 */
+#ifdef WIN32
+			printf(_(Encoding %s implied by locale is not allowed as a server-side encoding.\n
+	 The default database encoding has been set to %s instead.\n),
+   pg_encoding_to_char(ctype_enc),
+   pg_encoding_to_char(PG_UTF8));
+			ctype_enc = PG_UTF8;
+			encodingid = encodingid_to_string(ctype_enc);
+#else
 			fprintf(stderr,
 	_(%s: locale %s requires unsupported encoding %s\n),
 	progname, lc_ctype, pg_encoding_to_char(ctype_enc));
@@ -2918,6 +2930,7 @@ main(int argc, char *argv[])
 	Rerun %s with a different locale selection.\n),
 	pg_encoding_to_char(ctype_enc), progname);
 			exit(1);
+#endif
 		}
 		else
 		{
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 6999a42..754f65b 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -185,6 +185,8 @@ static int	locale_date_order(const char *locale);
 static bool check_locale_name(const char *locale);
 static bool check_locale_encoding(const char *locale, int encoding);
 static void setlocales(void);
+static void strreplace(char *str, char *needle, char *replacement);
+static char *localemap(char *locale);
 static void usage(const char *progname);
 
 #ifdef WIN32
@@ -2253,6 +2255,79 @@ check_locale_encoding(const char *locale, int user_enc)
 	return true;
 }
 
+/*
+ * Replace 'needle' with 'replacement' in 'str' . Note that the replacement
+ * is done in-place, so 'replacement' must be shorter than 'needle'.
+ */
+static void
+strreplace(char *str, char *needle, char *replacement)
+{
+	char *s;
+
+	s = strstr(str, needle);
+	if (s != NULL)

Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Alvaro Herrera
Excerpts from Josh Berkus's message of lun mar 21 13:47:21 -0300 2011:

 We already did that, actually, when we implemented ARC: effectively gave
 PostgreSQL a 3-level cache.  The results were not very good, although
 the algorithm could be at fault there.

Was it really all that bad?  IIRC we replaced ARC with the current clock
sweep due to patent concerns.  (Maybe there were performance concerns as
well, I don't remember).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Mon, Mar 21, 2011 at 5:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 For general aggregates, you
 have to scan the table anyway.  If an index is useful for that, it'll
 get picked up in the normal planning process.

 if I do SELECT count(col) from tab with no WHERE clauses on a table
 with 1% non-null values in col will the planner correctly find the
 partial index? If so why doesn't the min/max planning find it?

It will not.  The hard part of doing something with that is that there
could be more than one aggregate.  I did think about whether we could
just push the IS NOT NULL into the main query, but that falls down on
cases like this:

select min(x), max(y) from tab;

If we try to modify that to

select min(x), max(y) from tab where x is not null and y is not null;

then we get the wrong answers, since x and y are probably nonnull in
different subsets of the table.

In the case of min/max, the endpoint hack makes the aggregates so cheap
that we can afford to perform a separate indexscan for each aggregate,
and thus having a NOT NULL qual that is different for each aggregate
isn't a problem (as long as we make sure it only affects that
aggregate's subquery and not the whole query).  This approach doesn't
scale to aggregates that will scan the whole table, though.

I suppose we might be able to do what you're suggesting for the case of
only one aggregate, but that isn't going to meet the desire of not
having a regression from what 9.0 could do with min/max.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Josh Berkus

 Was it really all that bad?  IIRC we replaced ARC with the current clock
 sweep due to patent concerns.  (Maybe there were performance concerns as
 well, I don't remember).

Yeah, that was why the patent was frustrating.  Performance was poor and
we were planning on replacing ARC in 8.2 anyway.  Instead we had to
backport it.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Merlin Moncure
On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote:
 On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Can't you make just one large mapping and lock it in 8k regions? I
 thought the problem with mmap was not being able to detect other
 processes 
 (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.html)
 compatibility issues (possibly obsolete), etc.

 I was assuming that locking part of a mapping would force the kernel
 to split the mapping. It has to record the locked state somewhere so
 it needs a data structure that represents the size of the locked
 section and that would, I assume, be the mapping.

 It's possible the kernel would not in fact fall over too badly doing
 this. At some point I'll go ahead and do experiments on it. It's a bit
 fraught though as it the performance may depend on the memory
 management features of the chipset.

 That said, that's only part of the battle. On 32bit you can't map the
 whole database as your database could easily be larger than your
 address space. I have some ideas on how to tackle that but the
 simplest test would be to just mmap 8kB chunks everywhere.

Even on 64 bit systems you only have 48 bit address space which is not
a theoretical  limitation.  However, at least on linux you can map in
and map out pretty quick (10 microseconds paired on my linux vm) so
that's not so big of a deal.  Dealing with rapidly growing files is a
problem.  That said, probably you are not going to want to reserve
multiple gigabytes in 8k non contiguous chunks.

 But it's worse than that. Since you're not responsible for flushing
 blocks to disk any longer you need some way to *unlock* a block when
 it's possible to be flushed. That means when you flush the xlog you
 have to somehow find all the blocks that might no longer need to be
 locked and atomically unlock them. That would require new
 infrastructure we don't have though it might not be too hard.

 What would be nice is a mlock_until() where you eventually issue a
 call to tell the kernel what point in time you've reached and it
 unlocks everything older than that time.

I wonder if there is any reason to mlock at all...if you are going to
'do' mmap, can't you just hide under current lock architecture for
actual locking and do direct memory access without mlock?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Chinese initdb on Windows

2011-03-21 Thread Dave Page
On Mon, Mar 21, 2011 at 7:29 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On windows, if you have OS locale set to Chinese (Simplified, PRC), initdb
 fails:

 X:\C:\pgsql-install\bin\initdb.exe -D data2
 The files belonging to this database system will be owned by user Heikki.
 This user must also own the server process.

 The database cluster will be initialized with locale Chinese
 (Simplified)_People
 's Republic of China.936.
 initdb: locale Chinese (Simplified)_People's Republic of China.936 requires
 unsu
 pported encoding GBK
 Encoding GBK is not allowed as a server-side encoding.
 Rerun initdb with a different locale selection.

 The easy workaround for that is to specify --encoding=UTF-8, as UTF-8 can be
 used with any locale on Windows. How about doing that automatically in
 initdb? Now that we have the smarts in psql to detect current encoding from
 the environment and set client_encoding accordingly, it Just Works. Attached
 is a patch for that.


 Once you get past that, however, there's another issue:

 ...

 creating directory data2 ... ok
 creating subdirectories ... ok
 selecting default max_connections ... 100
 selecting default shared_buffers ... 32MB
 creating configuration files ... ok
 creating template1 database in data2/base/1 ... ok
 initializing pg_authid ... FATAL:  database locale is incompatible with
 operatin
 g system
 DETAIL:  The database was initialized with LC_COLLATE Chinese
 (Simplified)_Peoples Republic of China.936,  which is not recognized by
 setlocale().
 HINT:  Recreate the database with another locale or install the missing
 locale.
 child process exited with exit code 1

 The problem is probably the apostrophe in the locale name, although it seems
 to be missing from the above error message. setlocale() has a known problem
 with locale names that have dots in the country name, and looks like it has
 similar issues with apostrophes.

 Fortunately, there are aliases for those problematic locales on Windows,
 that don't have dots or apostrophes in the names. We did some testing in
 EnterpriseDB of various locales on various versions of Windows, and came up
 with the following mappings:

 *_Hong Kong S.A.R.* - *_HKG.*
 *_U.A.E.* - *_ARE.*
 *_People's Republic of China.* - *_China.*
 China_Macau S.A.R..950 - ZHM

 The first three mappings map the full country name to an abbreviation that
 is also accepted by Windows' setlocale(). See
 http://msdn.microsoft.com/en-us/library/cdax410z%28v=vs.71%29.aspx. ARE is
 not on that list, but seems to work.

 Macau is trickier. ZHM is not an abbreviation of the country, but of the
 whole locale, so we can't replace just the country part. So this will not
 work for Finnish_Macau S.A.R..950, like the other mappings do.
 Nevertheless, it works for the common case.

 Any objections to the 2nd attached patch, which adds the mapping of those
 locale names on Windows?

 I'm thinking it's not too late to do this in 9.1.

I've heard complaints a number of times from Chinese users who I
believe this would help.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] really lazy vacuums?

2011-03-21 Thread Jim Nasby
On Mar 16, 2011, at 7:44 PM, Robert Haas wrote:
 It
 would be really nice (for this and for other things) if we had some
 way of measuring the I/O saturation of the system, so that we could
 automatically adjust the aggressiveness of background processes
 accordingly.

Has anyone looked at the overhead of measuring how long IO requests to the 
kernel take? If we did that not only could we get an idea of what our IO 
workload looked like, we could also figure out whether a block came out of 
cache or not. That information could potentially be useful to the planner, but 
even if the database couldn't use that knowledge itself it would be a damn 
useful statistic to have... IMHO, far more useful than our current hit rate 
statistics.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Radosław Smogura
Merlin Moncure mmonc...@gmail.com Monday 21 March 2011 20:58:16
 On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote:
  On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com 
wrote:
  Can't you make just one large mapping and lock it in 8k regions? I
  thought the problem with mmap was not being able to detect other
  processes
  (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.htm
  l) compatibility issues (possibly obsolete), etc.
  
  I was assuming that locking part of a mapping would force the kernel
  to split the mapping. It has to record the locked state somewhere so
  it needs a data structure that represents the size of the locked
  section and that would, I assume, be the mapping.
  
  It's possible the kernel would not in fact fall over too badly doing
  this. At some point I'll go ahead and do experiments on it. It's a bit
  fraught though as it the performance may depend on the memory
  management features of the chipset.
  
  That said, that's only part of the battle. On 32bit you can't map the
  whole database as your database could easily be larger than your
  address space. I have some ideas on how to tackle that but the
  simplest test would be to just mmap 8kB chunks everywhere.
 
 Even on 64 bit systems you only have 48 bit address space which is not
 a theoretical  limitation.  However, at least on linux you can map in
 and map out pretty quick (10 microseconds paired on my linux vm) so
 that's not so big of a deal.  Dealing with rapidly growing files is a
 problem.  That said, probably you are not going to want to reserve
 multiple gigabytes in 8k non contiguous chunks.
 
  But it's worse than that. Since you're not responsible for flushing
  blocks to disk any longer you need some way to *unlock* a block when
  it's possible to be flushed. That means when you flush the xlog you
  have to somehow find all the blocks that might no longer need to be
  locked and atomically unlock them. That would require new
  infrastructure we don't have though it might not be too hard.
  
  What would be nice is a mlock_until() where you eventually issue a
  call to tell the kernel what point in time you've reached and it
  unlocks everything older than that time.
Sorry for curious, but I think mlock is for swap prevent not for flush 
prevent.

 I wonder if there is any reason to mlock at all...if you are going to
 'do' mmap, can't you just hide under current lock architecture for
 actual locking and do direct memory access without mlock?
 
 merlin

mmap man do not say anything about when flush occurs when mmap is file and is 
shared, so flushes may be intended or not. Much more, this what I read, SysV 
shared memory is emulated by mmap (and I think this mmap is on /dev/shm)

Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-21 Thread Yeb Havinga
On Mon, Mar 21, 2011 at 7:51 PM, Yeb Havinga yebhavi...@gmail.com wrote:

 On 2011-03-21 18:04, Robert Haas wrote:

 On Mon, Mar 21, 2011 at 12:29 PM, Yeb Havingayebhavi...@gmail.com
  wrote:

 pgbench -i -s 50 test
 Two runs of pgbench -c 10 -M prepared -T 600 test with 1 sync standby -
 server configs etc were mailed upthread.

  - performance as of commit e148443ddd95cd29edf4cc1de6188eb9cee029c5

 1158 and 1306 (avg 1232)

 - performance as of current git master

 1181 and 1280 (avg 1230,5)

 - performance as of current git master with
 sync-standbys-defined-rearrangement applied

 1152 and 1269 (avg 1210,5)



 IMO what these tests have shown is that there is no 20% performance
 difference between the different versions. To determine if there are
 differences, n should be a lot higher, or perhaps a single one with a very
 large duration.


pgbench -T 3600:

sync-standbys-defined-rearrangement 1270 tps
current git master 1306 tps

-- 
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
I wrote:
 Hmm.  We could possibly fix this by having planagg.c do a completely
 separate planner run for each aggregate, wherein it actually does build
 the equivalent query
   SELECT col FROM tab WHERE existing-quals AND col IS NOT NULL
   ORDER BY col ASC/DESC LIMIT 1
 and plan that.  That'd be less efficient than the current way,
 especially for cases where there are multiple aggregates, because there
 would be some duplication of processing between the per-aggregate
 planner runs and the main one.  But since we can only do this
 optimization for rather simple queries anyway, maybe it wouldn't matter
 much.

I studied the code some more, and I think this probably can be made to
work.  The basic idea is to have preprocess_minmax_aggregates build
simplified queries like the above (working by modifying the query tree
that exists at the point where it's called) and call query_planner on
them.  Save aside the resulting path data, then let the regular planning
process continue.  When optimize_minmax_aggregates is called, see
whether the regular plan is cheaper than the sum of the path costs.
If not, use the paths to construct a replacement plan, same as now.

The reason this should work is that query_planner() embodies pretty
much all the useful processing that happens between
preprocess_minmax_aggregates and optimize_minmax_aggregates --- the
other code in that stretch is mostly about grouping, which would disable
the minmax optimization anyway.  So no important steps will get left
out.  Of course, this introduces still more coupling between planagg.c
and planner.c, but I think that's probably tolerable.  The main
objection to this approach is having to do all the index analysis
N+1 times for an N-aggregate query.  I don't see any practical
alternative though if we want to make use of indexes that wouldn't be
used without the IS NOT NULL clause.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Chinese initdb on Windows

2011-03-21 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Any objections to the 2nd attached patch, which adds the mapping of 
 those locale names on Windows?

I think the added initdb message isn't following our style guidelines
--- it certainly doesn't match the adjacent existing message.  Other
than that quibble, ok here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \dt and table size

2011-03-21 Thread David Fetter
+1 for fixing this behavior in 9.1.  -1 for changing in 9.0, as the
change in behavior mid-release will cause more confusion than the
incomplete accounting does.

Cheers,
David.
On Mon, Mar 21, 2011 at 06:44:51PM +0100, Bernd Helmle wrote:
 It stroke me today again, that \dt+ isn't displaying the acurate
 table size for tables, since it uses pg_relation_size() till now.
 With having pg_table_size() since PostgreSQL 9.0 available, i
 believe it would be more useful to have the total acquired storage
 displayed, including implicit objects (the mentioned case where it
 was not very useful atm was a table with a big TOAST table).
 
 Attached minor patch extends \dt to use pg_table_size() starting
 with PostgreSQL 9.0, not sure if we backport such changes though. It
 would be interesting for 9.1, however.
 
 -- 
 Thanks
 
   Bernd


 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] really lazy vacuums?

2011-03-21 Thread Greg Stark
On Mon, Mar 21, 2011 at 6:08 PM, Jim Nasby j...@nasby.net wrote:
 Has anyone looked at the overhead of measuring how long IO requests to the 
 kernel take? If we did that not only could we get an idea of what our IO 
 workload looked like, we could also figure out whether a block came out of 
 cache or not. That information could potentially be useful to the planner, 
 but even if the database couldn't use that knowledge itself it would be a 
 damn useful statistic to have... IMHO, far more useful than our current hit 
 rate statistics.


I've done this -- actually better, I used mincore to actually check
whether the block was in cache before issuing the read -- but it turns
out you can't get what you're looking for this way.

It turns out when you do this you see one block being read from disk
followed by n blocks that all appear to be cache hits. Because they've
been prefetched by the kernel.

What you end up with is actually something like the number of iops
which is also an interesting measure but not really what you were
looking for.

My getrusage patch, which I should still dig out though it's rather
too late to be committing now unless someone tells me otherwise, would
tell you how much i/o a plan node actually did. But you won't know
which blocks did the i/o since I was only tracking totals for the plan
node. That's probably what you're looking for here.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2011 - Mentors? Projects?

2011-03-21 Thread Devrim GÜNDÜZ
On Mon, 2011-03-14 at 16:08 +, Dave Page wrote:
 
  Please count me in as a mentor for a GUI project. It is an open
 source
  dashboard for PostgreSQL DBAs, called ADJ Dashboard.
 
 I can't find any info on that via Google. URL please? 

Erdinc said that he will setup a web page for this, and let the list
know.

Regards,
-- 
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] How to look at the Expression Trees

2011-03-21 Thread Vaibhav Kaushal
Hi,

You said:

ExecMakeFunctionResultNoSets is used to evaluate function calls.

What are the 'functions' there? Are they the user supplied pl/PGSQL style
user functions, the functions handled by fmgr or are they just another C
function which make the Expression Evaluator?

Regards,
Vaibhav


[HACKERS] Re: Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Greg Stark
On Tue, Mar 22, 2011 at 4:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Reimplement planner's handling of MIN/MAX aggregate optimization (again).


I'm just curious, Why is this no longer an interesting special case?


--- this is an interesting special case as of 9.1
-explain (costs off)
-  select min(unique2) from tenk1 where unique2 = 42;
-  QUERY PLAN

- Aggregate
-   -  Index Scan using tenk1_unique2 on tenk1
- Index Cond: (unique2 = 42)



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Tue, Mar 22, 2011 at 4:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Reimplement planner's handling of MIN/MAX aggregate optimization (again).

 I'm just curious, Why is this no longer an interesting special case?

 --- this is an interesting special case as of 9.1
 -explain (costs off)
 -  select min(unique2) from tenk1 where unique2 = 42;
 -  QUERY PLAN
 
 - Aggregate
 -   -  Index Scan using tenk1_unique2 on tenk1
 - Index Cond: (unique2 = 42)

In the pathkey-based implementation, that resulted in an empty pathkey
list, which that implementation couldn't deal with.  I figured that was
okay because the default plan isn't bad in such a case, but I put in a
test case (probably because the code failed before I put in a defense
against it, but I don't recall for sure).  It's not particularly a
corner case for the new code, though, and the resulting plan changed
(because the new code will in fact turn this into a LIMIT subselect
anyway).  So I debated whether to change the expected output or just
take it out, and I chose the latter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers