Re: [HACKERS] navigation menu for documents

2009-07-16 Thread David E. Wheeler

On Jul 16, 2009, at 12:49 PM, Andrew Dunstan wrote:

I'm using jQuery to pull the proper doc into a div. I'm still  
noodling with it, trying to fix encoding issues on Windows, but  
it's pretty close to done.


Yes, that's nice, it's just the sort of thing I had in mind - if you  
can do it with a div instead of frames I'm fine with that.


Yep. If I can solve the bloody encoding issue with Windows, it'll be  
good to go with Pod docs, and easily portable to any HTML-based docs.


Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review: revised hstore patch

2009-07-16 Thread David E. Wheeler

On Jul 16, 2009, at 7:17 AM, Andrew Gierth wrote:


Revision to previous hstore patch to fix (and add tests for) some edge
case bugs with nulls or empty arrays.


This looks great, Andrew, thanks. Here's what I did to try it out:

* I built a simple database with a table with an (old) hstore column.

* I put in some data and wrote a bit of simple SQL to test the  
existing implementation, functions, etc., as documented.


* I dumped the data.

* I applied your patch, rebuilt hstore, installed the DSO, and  
restarted PotgreSQL.


* I ran the hstore `make installcheck` and all tests passed.

* I dropped the test database, created a new one, and installed hstore  
into it.


* I restored the dump and ran my little regression. All the behavior  
was the same. The only difference was that `hstore = hstre` started to  
work instead of dying -- yay!


* I then did some experimentation to make sure that all of the new  
functions and operators worked as documented. They did. I attach my  
fiddling for your amusement.


Notes and minor issues:

* This line in the docs:

  'a=>1,b=>2'::hstore ?& ARRAY['a','b']literal>


  Needs "?&" changed to "?&

* `hstore - hstore` resolves before `hstore - text`, meaning that this  
failed:


  SELECT 'a=>1, b =>2'::hstore - 'a' = 'b=>2';
  ERROR:  Unexpected end of string
  LINE 1: SELECT 'a=>1, b =>2'::hstore - 'a' = 'b=>2';

  But it works if I cast it:

  SELECT 'a=>1, b =>2'::hstore - 'a'::text = 'b=>2';

  Not sure if there's anything to be done about that.

* There are a few operators that take text or a text array as the left  
operand, such as `-` and `->`, but not with `?`. This is because the `? 
` operator, which returns true if an hstore has a particular key, can  
have two meanings when the left operand is an array: either it has all  
the keys or it has some of the keys in the array. This patch avoids  
this issue by making the former `?&` and the latter `?|`. I appreciate  
the distinction, but wanted to point out that it is at the price of  
inconsistency vis-a-vis some other operators (that, it must be said,  
don't have the three-branch logic to deal with). I think it's a good  
call, though.


* Maybe it's time to kill off `...@` and `~`, eh? Or could they generate  
warnings for a release or something? How are operators properly  
deprecated?


* The conversion between records and hstores and using hstores to  
modify particular values in records is hot.


* The documentation for `populate_record()` is wrong. It's just a cut- 
and-paste of `delete()`


* The NOTE about `populoate_record()` says that it takes anyelement  
instead of record and just throws an error if it's not a record. I'm  
sure there's a good reason for that, but maybe there's a better way?


* Your original submission say that the new version offers btree and  
hash support, but the docs still mention only GIN and GIST.


* I'd like to see more examples of the new functionality in the  
documentation.


I'd like to do some word-smithing to the docs, but I'm happy to wait  
until it's committed and send a new patch. Otherwise, a few minor  
documentation issues notwithstanding, I think that this patch is ready  
for committer review and, perhaps, committing. The code looks clean  
(though mainly over my head) and the functionality is top-notch.


Best,

David



hstore.sql
Description: Binary data





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-www] Launching commitfest.postgresql.org

2009-07-16 Thread Josh Berkus

Robert,

BTW, the new commitfest software is great.  Easily a 75% reduction in 
time required to track reviewing activity.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SE-PgSQL/tiny rev.2193

2009-07-16 Thread KaiGai Kohei
Robert Haas wrote:
> 2009/7/16 KaiGai Kohei :
>> Yes, the tiny version will not give any advantages in security without
>> future enhancements.
>> It is not difficult to add object classes and permissions.
>> If necessary, I'll add checks them with corresponding permissions.
>>
>> One anxiety is PostgreSQL specific object class, such as LANGUAGE.
>> It's not clear for me whether the maintainer of the SELinux security
>> policy accept these kind of object classes, or not.
>> I would like to implement them except for PostgreSQL specific object
>> class in this phase.
> 
> I'm starting to think that there's just no hope of this matching up
> well enough with the way PostgreSQL already works to have a chance of
> being accepted.

I believe different security mechanisms can have different viewpoints,
security models, criteria to make its access control decision and so on.
Some of SELinux part can match with PostgreSQL's permission with one-to-one,
such as db_database:{connect} and ACL_CONNECT.
But, we don't need to mind different security mechanism has different stuffs.

For example, Linux applies its DAC permission checks at inode_permission()
which also calls selinux_inode_permission(). It is the one-to-one mapping
part. But SELinux (LSM in actually) also put its security hooks in other
points, such as selinux_inode_rename() to check file renaming permissions,
selinux_socket_accept() to check accept connections permissions although
vanilla OS does not have corresponding permissions.

What I would like to say is that it is quite natural different security
mechanism has not-identical security models and so on.

>> Here is a few differences in access control model between PostgreSQL and
>> SELinux, so I could not map all the SELinux permissions on the 
>> pg_xxx_aclcheck()
>> mechanism.
>>
>> For example, ExecCheckRTEPerms() checks permissions on the tables and
>> columns appeared in the user given query. When the user have SELECT
>> permission on the required table, it bypasses to check permissions on
>> the columns.
>> SELinux's security model needs to check permissions on all the required
>> objects. For example, "SELECT A,B FROM T" requires the client to have
>> db_table:{select} on T and db_column:{select} on A and B.
> 
> Isn't this a purely arbitrary decision on your part to implement
> incompatible semantics?  I don't see why it can't check for
> db_table:{select} and if that fails then check for db_column:[select}
> on each column?  Maybe that's not legit, I don't understand SE-Linux
> well enough to know.  But I think we need to get someone from the
> SE-Linux community involved to help review and consider these kinds of
> issues, because it is obvious that we don't have the expertise in the
> PostgreSQL community.

Indeed, operating system does not have well analogy, such as a relationship
between tables and columns. However, SELinux's rule is simple.

Please imagine a situation when we write a security policy module.
When we would like to control accesses a certain object (such as column),
all we need to focus on is users' privileges on the column only.
If checks on columns are bypassed when he is allowed to access to the
table, we need to focus on both of the table and the column.
It seems to me it goes against to the principle in SELinux.
(Note that I never say database ACL model is bad. Both models have
their philosophies, and I believe both are worth each other.)

If you need any more comments from other persons in SELinux community,
I try to call Joshua Brindle who had been invoked in pgsql-hackers
several months ago.

>> For other example, some of pg_xxx_aclcheck() is bypassed when the client
>> has superuser privilege. In this case, SELinux requires the client to
>> have both of db_database:{superuser} and a certain permission.
> 
> Surely you can't just transform (A OR B) into (A AND B) and pretend
> that's the same thing...

It needs to change code path but it is not reasonable when user
disables SE-PgSQL. So, I thought separated security hooks are
reasonable, because it is replaced by empty macro if disabled.

>> Sorry, I could not read it from the previous suggestions.
>> If you have been suggesting it repeatedly, I'm sorry so much.
> 
> I think the language barrier is part of what is making this a very
> difficult process.  Your English is surely better than my Japanese,
> but we are definitely going around in circles.
> 
> ...Robert
> 

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Pavel Stehule
Hello

look on:
postgres=# explain select count(*) over () from x;
 QUERY PLAN
-
 WindowAgg  (cost=0.00..265.00 rows=1 width=0)
   ->  Seq Scan on x  (cost=0.00..140.00 rows=1 width=0)
(2 rows)

Time: 1,473 ms
postgres=# explain select count(*) over (order by a) from x;
   QUERY PLAN

 WindowAgg  (cost=0.00..556.25 rows=1 width=4)
   ->  Index Scan using gg on x  (cost=0.00..406.25 rows=1 width=4)
(2 rows)

but
query1: 160ms
query2: 72ms

regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY WITH CSV FORCE QUOTE * -- REVIEW

2009-07-16 Thread Itagaki Takahiro
Josh Berkus  wrote:
> On 7/16/09 12:53 PM, Robert Haas wrote:
> > I think perhaps we should ask the patch author to remove the NOT NULL
> > stuff first?
> 
> Yes, current status is "Waiting on Author".

OK, I removed "FORCE NOT NULL" stuff from the patch.
The attached patch only adds "FORCE QUOTE *" feature.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



force_quote_all-20090717.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SE-PgSQL/tiny rev.2193

2009-07-16 Thread Robert Haas
2009/7/16 KaiGai Kohei :
> Yes, the tiny version will not give any advantages in security without
> future enhancements.
> It is not difficult to add object classes and permissions.
> If necessary, I'll add checks them with corresponding permissions.
>
> One anxiety is PostgreSQL specific object class, such as LANGUAGE.
> It's not clear for me whether the maintainer of the SELinux security
> policy accept these kind of object classes, or not.
> I would like to implement them except for PostgreSQL specific object
> class in this phase.

I'm starting to think that there's just no hope of this matching up
well enough with the way PostgreSQL already works to have a chance of
being accepted.

> Here is a few differences in access control model between PostgreSQL and
> SELinux, so I could not map all the SELinux permissions on the 
> pg_xxx_aclcheck()
> mechanism.
>
> For example, ExecCheckRTEPerms() checks permissions on the tables and
> columns appeared in the user given query. When the user have SELECT
> permission on the required table, it bypasses to check permissions on
> the columns.
> SELinux's security model needs to check permissions on all the required
> objects. For example, "SELECT A,B FROM T" requires the client to have
> db_table:{select} on T and db_column:{select} on A and B.

Isn't this a purely arbitrary decision on your part to implement
incompatible semantics?  I don't see why it can't check for
db_table:{select} and if that fails then check for db_column:[select}
on each column?  Maybe that's not legit, I don't understand SE-Linux
well enough to know.  But I think we need to get someone from the
SE-Linux community involved to help review and consider these kinds of
issues, because it is obvious that we don't have the expertise in the
PostgreSQL community.

> For other example, some of pg_xxx_aclcheck() is bypassed when the client
> has superuser privilege. In this case, SELinux requires the client to
> have both of db_database:{superuser} and a certain permission.

Surely you can't just transform (A OR B) into (A AND B) and pretend
that's the same thing...

> Sorry, I could not read it from the previous suggestions.
> If you have been suggesting it repeatedly, I'm sorry so much.

I think the language barrier is part of what is making this a very
difficult process.  Your English is surely better than my Japanese,
but we are definitely going around in circles.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SE-PgSQL/tiny rev.2193

2009-07-16 Thread KaiGai Kohei
Robert Haas wrote:
> 2009/7/16 KaiGai Kohei :
>> Updated SE-PgSQL patch is here:
>>
>>  http://sepgsql.googlecode.com/files/sepgsql-01-tiny-8.5devel-r2196.patch.gz
>>
>> Unused definitions of SELinux's permissions are ripped out from
>> the permission table.
> 
> OK, I'm looking at this version of the patch, and my first reaction is
> that it appears to be completely useless.  Standard PostgreSQL has
> grantable privileges on 10 classes of objects: TABLE, COLUMN,
> SEQUENCE, DATABASE, FOREIGN DATA WRAPPER, FOREIGN SERVER, FUNCTION,
> LANGUAGE, SCHEMA, and TABLESPACE.  This patch, on the other hand,
> implements SE-pgsql privileges for 3 classes of objects: DATABASE,
> NAMESPACE, and FUNCTION.  That doesn't seem to jive in any conceivable
> way with the advice that has been given repeatedly and in every single
> review of this patch, which is to make the scope of SE-pgsql line up
> exactly with the standard permissions that PostgreSQL already
> implements.  In earlier versions, there were checks on all sorts of
> extra things (like rows, and various DDL operations) that invented
> whole new classes of permission checks.  Now you've gone to the
> opposite extreme of checking almost nothing.

Yes, the tiny version will not give any advantages in security without
future enhancements.
It is not difficult to add object classes and permissions.
If necessary, I'll add checks them with corresponding permissions.

One anxiety is PostgreSQL specific object class, such as LANGUAGE.
It's not clear for me whether the maintainer of the SELinux security
policy accept these kind of object classes, or not.
I would like to implement them except for PostgreSQL specific object
class in this phase.

> The goal here is not to pare this patch down to nothing: it's to
> implement a coherent feature set that matches what PostgreSQL already
> does.  Here is what I wrote 4 days ago:
> 
> "Another problem that I have with this patch set is that it STILL
> doesn't have parity with the DAC permissions scheme (despite previous
> requests to make it have parity)"
> 
> Here is what I wrote 3 days ago:
> 
> "Yes: to repeat what has been said multiple times previously, you
> should postpone everything that isn't a mirror of the current security
> model: there should only be permission checks in places where there
> are permissions checks now, and they should be mirror images of the
> current DAC checks."
> 
> Here is what I wrote yesterday:
> 
> "So the point we keep repeating here is that SEPostgreSQL should be
> doing the same kinds of permissions checks as regular PostgreSQL using
> the same names, code paths, etc. I don't know how to say it any more
> clearly than that."
> 
> And just for reference, here is what Peter wrote 5 months ago, which
> is basically saying the same thing:
> 
> "If I had to do this, I would first write a patch for #1: A patch that
> additionally executes existing privilege checks against an SELinux
> policy. Existing privilege checks are a well-defined set: they mostly
> happen through pg_xxx_aclcheck() functions. Hook your checks in
> there."
> 
> I really don't understand why this is so difficult.  But I don't think
> you should bother resubmitting this patch until you've taken this
> advice.  I am kind of running out patience here.  I've reviewed this
> patch 3 times and found the exact same issue each time.

Here is a few differences in access control model between PostgreSQL and
SELinux, so I could not map all the SELinux permissions on the pg_xxx_aclcheck()
mechanism.

For example, ExecCheckRTEPerms() checks permissions on the tables and
columns appeared in the user given query. When the user have SELECT
permission on the required table, it bypasses to check permissions on
the columns.
SELinux's security model needs to check permissions on all the required
objects. For example, "SELECT A,B FROM T" requires the client to have
db_table:{select} on T and db_column:{select} on A and B.

For other example, some of pg_xxx_aclcheck() is bypassed when the client
has superuser privilege. In this case, SELinux requires the client to
have both of db_database:{superuser} and a certain permission.

If we deploy security hooks at the pg_xxx_aclcheck() or related stuff
at the first version, it is necessary to move them more appropriate
points later.
I can agree to deploy security hooks on pg_xxx_aclcheck(), if we can
move them to other points which can implement security model correctly
in the later version.

Sorry, I could not read it from the previous suggestions.
If you have been suggesting it repeatedly, I'm sorry so much.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-www] Launching commitfest.postgresql.org

2009-07-16 Thread Robert Haas
On Tue, Jul 14, 2009 at 5:33 PM, Stefan
Kaltenbrunner wrote:
>> If you have time, that would be great; if not I will do it.
>
> well you just volunteered...

I was trying hard not to.

But, done.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-16 Thread Kevin Grittner
Rebased to correct for pg_indent changes.
 
Applies cleanly.
Compiles cleanly.
Passes regression tests.
Comments and format look good.
No documentation changes needed.
No regression test changes needed.
 
Performance tests to follow in a day or two.
 
-Kevin
Index: src/bin/pg_dump/pg_backup_archiver.c
===
RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.172
diff -c -d -r1.172 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c11 Jun 2009 14:49:07 -  
1.172
--- src/bin/pg_dump/pg_backup_archiver.c17 Jul 2009 02:20:28 -
***
*** 59,70 
--- 59,80 
  #define thandle HANDLE
  #endif
  
+ /* List header for pending-activity lists */
+ typedef struct
+ {
+   TocEntry   *head;
+   TocEntry   *tail;
+   /* The list link fields in each TocEntry are par_prev and par_next */
+ } TocEntryList;
+ 
+ /* Arguments needed for a worker child */
  typedef struct _restore_args
  {
ArchiveHandle *AH;
TocEntry   *te;
  } RestoreArgs;
  
+ /* State for each parallel activity slot */
  typedef struct _parallel_slot
  {
thandle child_id;
***
*** 117,124 
  static thandle reap_child(ParallelSlot *slots, int n_slots, int *work_status);
  static bool work_in_progress(ParallelSlot *slots, int n_slots);
  static intget_next_slot(ParallelSlot *slots, int n_slots);
  static TocEntry *get_next_work_item(ArchiveHandle *AH,
!  TocEntry **first_unprocessed,
   ParallelSlot *slots, int n_slots);
  static parallel_restore_result parallel_restore(RestoreArgs *args);
  static void mark_work_done(ArchiveHandle *AH, thandle worker, int status,
--- 127,138 
  static thandle reap_child(ParallelSlot *slots, int n_slots, int *work_status);
  static bool work_in_progress(ParallelSlot *slots, int n_slots);
  static intget_next_slot(ParallelSlot *slots, int n_slots);
+ static void toc_list_append(TocEntryList *l, TocEntry *te);
+ static void toc_list_remove(TocEntryList *l, TocEntry *te);
+ static void find_ready_items(TocEntryList *pending_list,
+TocEntryList 
*ready_list);
  static TocEntry *get_next_work_item(ArchiveHandle *AH,
!  TocEntryList *ready_list,
   ParallelSlot *slots, int n_slots);
  static parallel_restore_result parallel_restore(RestoreArgs *args);
  static void mark_work_done(ArchiveHandle *AH, thandle worker, int status,
***
*** 3065,3071 
ParallelSlot *slots;
int work_status;
int next_slot;
!   TocEntry   *first_unprocessed = AH->toc->next;
TocEntry   *next_work_item;
thandle ret_child;
TocEntry   *te;
--- 3079,3086 
ParallelSlot *slots;
int work_status;
int next_slot;
!   TocEntryList pending_list;
!   TocEntryList ready_list;
TocEntry   *next_work_item;
thandle ret_child;
TocEntry   *te;
***
*** 3087,3094 
 * faster in a single connection because we avoid all the connection and
 * setup overhead.
 */
!   while ((next_work_item = get_next_work_item(AH, &first_unprocessed,
!   
NULL, 0)) != NULL)
{
if (next_work_item->section == SECTION_DATA ||
next_work_item->section == SECTION_POST_DATA)
--- 3102,3108 
 * faster in a single connection because we avoid all the connection and
 * setup overhead.
 */
!   for (next_work_item = AH->toc->next; next_work_item != AH->toc; 
next_work_item = next_work_item->next)
{
if (next_work_item->section == SECTION_DATA ||
next_work_item->section == SECTION_POST_DATA)
***
*** 3100,3106 
  
(void) restore_toc_entry(AH, next_work_item, ropt, false);
  
-   next_work_item->restored = true;
reduce_dependencies(AH, next_work_item);
}
  
--- 3114,3119 
***
*** 3125,3130 
--- 3138,3162 
AH->currWithOids = -1;
  
/*
+* Initialize the lists of pending and ready items.  After this setup,
+* the pending list is everything that needs to be done but is blocked
+* by one or more dependencies, while the ready list contains items that
+* have no remaining dependencies.  Note: we don't yet filter out 
entries
+* that aren't going to be restored.  They might participate in
+* dependency chains connecting entries that should be restored, so we
+* treat them as live unti

Re: [HACKERS] join removal

2009-07-16 Thread Robert Haas
On Thu, Jul 16, 2009 at 9:02 PM, Greg Stark wrote:
> I started looking at this patch and it looks pretty good as far as it
> goes. But I think we can do a lot more. It seems to me the cases where
> foreign key relationships exist are likely to be really big use cases.

I agree.  But that seems a lot harder, and this is useful all by
itself because it can eliminate LEFT joins.  Foreign key deductions
will be necessary to eliminate inner joins and self-joins.  I've been
advised that when writing patches for PostgreSQL it's best to start
with something small.  :-)

> I have one big worry though. Currently you're detecting the unique
> property using the planner's path mechanism. I suppose that works, but
> it's only an accident of the planner design that the path for the
> unique index will always be there if it's the join condition. My
> instinct is that this code should be going back to the raw index info
> to prove this property. The only practical effect I can think of is
> that the plan will have to be marked as being dependent on that index
> and that will be hard to do if you haven't identified a specific index
> you're basing it on.

I had trouble figuring out where to hook in the logic.  In an ideal
world, it would be nice to detect that the join is removable earlier,
but it's hard to do that, because it's not until we know the join
order that we can test whether any attributes from the inner rel are
used above the level of the join.  But as it is the fact that the join
can be removed will have to be rediscovered over and over again as
planning progresses.

As for going back to "the raw index info", that was kind of my
instinct as well but I couldn't make it work.  It seems that the
IndexOptInfo structure only knows the column numbers of the index's
keys, whereas the code that considers possible join strategies has
only equivalence classes to work with, and I don't see how to match
the two up.  If we can figure out a way to do that it would probably
be cleaner.

> I would like to see a list of cases we plan to tackle preferably with
> example queries, as a kind of checklist so we can knock them down one
> by one.  Right now it's unclear just how much of the problem space is
> being solved.

I don't know how many cases I personally plan to handle because I
don't know how much time I'm going to have to work on this or whether
I have the needed brainpower.  But I can enumerate the cases that I
know about where this is theoretically possible.

- LEFT joins can be eliminated if the nullable side of the join can be
proved unique over the join columns.  The simplest and most common
case is the one where there is a unique index on any (not necessarily
proper) subset of the join columns, but it can also happen in any
other case where we can prove that the inner rel is unique over (a
subset of) the relevant columns, such as when the inner rel groups by
those columns.  There is an existing function query_is_distinct_for()
that does something along these lines, but it operates on yet another
different type of data structure (a Query, rather than a list of
equivalence classes or alternatively a list of varattnos) and doesn't
handle the unique-index case, which is probably the most important one
for this optimization.

- INNER joins are more complex because what happens on the inner side
of the join can potentially wipe out rows from the result.  With a
LEFT join, it's sufficient to prove that the inner rel is at least
unique enough, but for an INNER join, we have to prove that it's
exactly UNIQUE enough.  I think we can only provide this when the
inner rel is a base relation with a unique index over EXACTLY (not a
subset of) the relevant columns AND there is a foreign key
relationship from the outer rel to the inner rel over the join
columns.

- Self-joins (whether they are inner, left, semi, or full) can be
collapsed into a scan of the underlying base relation if the join
columns on both sides include all the columns of the same unique
index.  All the quals from both sides have to be applied.

> Incidentally, guess what other database just got this feature committed...
>
> http://askmonty.org/worklog/Client-BackLog/?tid=17

Hmm, well, it would be nice to have parity.  This is a hugely
important feature for the kinds of queries I do all day.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SE-PgSQL/tiny rev.2193

2009-07-16 Thread Robert Haas
2009/7/16 KaiGai Kohei :
> Updated SE-PgSQL patch is here:
>
>  http://sepgsql.googlecode.com/files/sepgsql-01-tiny-8.5devel-r2196.patch.gz
>
> Unused definitions of SELinux's permissions are ripped out from
> the permission table.

OK, I'm looking at this version of the patch, and my first reaction is
that it appears to be completely useless.  Standard PostgreSQL has
grantable privileges on 10 classes of objects: TABLE, COLUMN,
SEQUENCE, DATABASE, FOREIGN DATA WRAPPER, FOREIGN SERVER, FUNCTION,
LANGUAGE, SCHEMA, and TABLESPACE.  This patch, on the other hand,
implements SE-pgsql privileges for 3 classes of objects: DATABASE,
NAMESPACE, and FUNCTION.  That doesn't seem to jive in any conceivable
way with the advice that has been given repeatedly and in every single
review of this patch, which is to make the scope of SE-pgsql line up
exactly with the standard permissions that PostgreSQL already
implements.  In earlier versions, there were checks on all sorts of
extra things (like rows, and various DDL operations) that invented
whole new classes of permission checks.  Now you've gone to the
opposite extreme of checking almost nothing.

The goal here is not to pare this patch down to nothing: it's to
implement a coherent feature set that matches what PostgreSQL already
does.  Here is what I wrote 4 days ago:

"Another problem that I have with this patch set is that it STILL
doesn't have parity with the DAC permissions scheme (despite previous
requests to make it have parity)"

Here is what I wrote 3 days ago:

"Yes: to repeat what has been said multiple times previously, you
should postpone everything that isn't a mirror of the current security
model: there should only be permission checks in places where there
are permissions checks now, and they should be mirror images of the
current DAC checks."

Here is what I wrote yesterday:

"So the point we keep repeating here is that SEPostgreSQL should be
doing the same kinds of permissions checks as regular PostgreSQL using
the same names, code paths, etc. I don't know how to say it any more
clearly than that."

And just for reference, here is what Peter wrote 5 months ago, which
is basically saying the same thing:

"If I had to do this, I would first write a patch for #1: A patch that
additionally executes existing privilege checks against an SELinux
policy. Existing privilege checks are a well-defined set: they mostly
happen through pg_xxx_aclcheck() functions. Hook your checks in
there."

I really don't understand why this is so difficult.  But I don't think
you should bother resubmitting this patch until you've taken this
advice.  I am kind of running out patience here.  I've reviewed this
patch 3 times and found the exact same issue each time.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] [v8.5] Security checks on largeobjects

2009-07-16 Thread KaiGai Kohei
I summarized the design proposal and issues currently we have.

I would like to see any comments corresponding to the proposition.
Especially, selection of the snapshot is a headach issue for me.


This project tries to solve two items listed at:
  http://wiki.postgresql.org/wiki/Todo#Binary_Data

 * Add security checks for large objects
 * Allow read/write into TOAST values like large objects

= Introduction =

We need to associate a metadata for a certain largeobject to
implement security checks for largeobjects. However, the data
structure of largeobjects are not suitable to manage its
metadata (such as owner identifier, database acls ...) on
a certain largeobject, because a largeobject is stored as
separated page frames in the pg_largeobject system catalog.
Thus, we need to revise the data structure to manage a certain
largeobject.

An interesting fact is a similarity of data structure between
TOAST table and pg_lageobject.
A TOAST relation is declared as follows:
  pg_toast_%u (
  chunk_idoid,
  chunk_seq   int4,
  chunk_data  bytea,
  unique(chunk_id, chunk_seq)
  )

Definition of the pg_largeobject is as follows:
  pg_largeobject(
  loidoid,
  pageno  int4,
  databytea,
  unique(loid, pageno)
  )

They have an identical data structure, so it is quite natural
to utilize TOAST mechanism to store pagef rames of largeobject.

= Design =

In my plan, role of the pg_largeobject will be changed to
manage metadata of largeobjects, and it will be redefined
as follows:

  CATALOG(pg_largeobject,2613)
  {
  Oidloowner; /* OID of the owner */
  Oidlonsp;   /* OID of the namespace */
  aclitemloacl[1];/* ACL of the largeobject */
  Blob   lodata;  /* Contents of the largeobject */
  } FormData_pg_largeobejct;

For access controls purpose, its ownership, namespace and ACLs
are necessary. In addition, the Blob is a new type which support
to read/write a its partial TOAST value.

The current lo_xxx() interfaces will perform as a wrapper function
to access a certain pg_largeobject.lodata identified by a largeobject
handler. The loread(), lowrite() or similar interfaces will support
partial accesses on the Blob type. It enables user defined relation
to contain large data using TOAST mechanism, with reasonable resource
comsumption. (Note that TOAST replaces whole of the chunks with same
identifier, even if it changes just a single byte.)

= Interfaces =

== New type ==

We need a new variable length type that has the following feature,
to allow users partial accesses.

* It always use external TOAST table, independent from its size.
 If toasted data is stored as inline, we cannot update it independent
from the main table.
 It does not prevent partial read, but meaningless because inlined
data is enough small.

* It always store the data without any compression.
 We cannot easily compute required data offset on the compressed
 data. All the toasted data need to be uncompressed, for both of
 reader and writer access.

== lo_xxx() interfaces ==

A new version of loread() and lowrite() are necessary to access
a part of toasted data within user defined tables. It can be defined
as follows:

  loread(Blob data, int32 offset, int32 length)
  lowrite(Blob data, int32 offset, Bytea data)

== GRANT/REVOKE ==

When we access traditional largeobjects, reader permission (SELECT)
or writer permission (UPDATE) should be checked on accesses.

The GRANT/REVOKE statements are enhanced as follows:

  GRANT SELECT ON LARGE OBJECT 1234 TO kaigai;

It allows "kaigai" to read the largeobject: 1234.

= Issues =

== New pg_type.typstorage ==

The variable length data is always necessary to be stored in external
storage and uncompressed. The existing typstorage does not satisfies
the requirement, so we need to add a new pg_type.typstorage strategy.

The new typstorage strategy forces:
 - It always stores the given varlena data on external toast relation.
 - It always stores the given varlena data without any compression.

It will give us performance loss, so existing Text or Bytea will be
more suitable to store variable length data being not very large.

== Snapshot ==

The largeobject interface uses SnapshotNow for writable accesses, and
GetActiveSnapshot() for read-only accesses, but toast_fetch_datum()
uses SnapshotToast to scan the toast relation.

It seems to me SnapshotToast depends on an assumption that tuples
within TOAST relation does not have any multiple versions.
When we update a toast value, TOAST mechanism inserts whole of
variable length datum with a new chunk_id, and older chunks are
removed at toast_delete_datum().
The TOAST pointer is updated to the new chunk_id, and its visibility
is under MVCC controls.

The source code comments at HeapTupleSatisfiesToast() says as follows:

/*
 * HeapTupleSatisfiesToast
 *  True iff heap tuple is valid as a TOAST row.
 *
 * This is a si

Re: [HACKERS] join removal

2009-07-16 Thread Greg Stark
I started looking at this patch and it looks pretty good as far as it
goes. But I think we can do a lot more. It seems to me the cases where
foreign key relationships exist are likely to be really big use cases.

I have one big worry though. Currently you're detecting the unique
property using the planner's path mechanism. I suppose that works, but
it's only an accident of the planner design that the path for the
unique index will always be there if it's the join condition. My
instinct is that this code should be going back to the raw index info
to prove this property. The only practical effect I can think of is
that the plan will have to be marked as being dependent on that index
and that will be hard to do if you haven't identified a specific index
you're basing it on.

I would like to see a list of cases we plan to tackle preferably with
example queries, as a kind of checklist so we can knock them down one
by one.  Right now it's unclear just how much of the problem space is
being solved.

Incidentally, guess what other database just got this feature committed...

http://askmonty.org/worklog/Client-BackLog/?tid=17

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Andrew Dunstan



Brendan Jurd wrote:


app-text/openjade 1.3.2-r1
app-text/docbook-sgml 1.0
app-text/docbook-sgml-dtd 4.2-r2
app-text/docbook-sgml-utils 0.6.14
app-text/docbook-dsssl-stylesheets 1.79

... plus some other packages which were pulled in to satisfy
dependencies on the above.

The version is only a big deal for docbook-sgml-dtd -- you *must*
specify the 4.2 slot when emerging the package or you might end up
with 4.4 or some other slot, and that won't work with the Postgres
docs as explained at
http://www.postgresql.org/docs/current/static/docguide-toolsets.html.
For example you should be able to get the whole toolset with


  


I bet this is Josh's problem. He probably has the wrong DTD set. It 
would account for the error log he showed me.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Kevin Grittner
Brendan Jurd  wrote:
 
> The only trick was working out exactly which
> packages I needed to install.
 
Which were?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Alvaro Herrera
Robert Haas escribió:

> But I can't say I've ever had much trouble building the docs.  I find
> it a bit odd that "make" in the doc directory does nothing; and "make"
> in doc/src does nothing, but "make" in doc/src/sgml does what you
> expect.  I also find the slowness of openjade to be pretty annoying.
> But those are minor warts, not serious inconveniences that hinder
> reviewing.

Back when my machine took 45 mins to build the docs, what I did to
review doc changes was a quick "make check" to verify that the SGML was
not b0rked, then send the patch and look at the generated HTML in the
developer docs.

Nowadays the doc building process has been sped up inmensely by Peter's
recent changes.  And my machine has sped up too, as well.

The only thing I lament is that I can't do openjade -j2 to use two cores
to build (or should that be --workers=2 ?)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Brendan Jurd
2009/7/17 Kevin Grittner :
> Brendan Jurd  wrote:
>
>> The only trick was working out exactly which
>> packages I needed to install.
>
> Which were?
>

Currently I have:

app-text/openjade 1.3.2-r1
app-text/docbook-sgml 1.0
app-text/docbook-sgml-dtd 4.2-r2
app-text/docbook-sgml-utils 0.6.14
app-text/docbook-dsssl-stylesheets 1.79

... plus some other packages which were pulled in to satisfy
dependencies on the above.

The version is only a big deal for docbook-sgml-dtd -- you *must*
specify the 4.2 slot when emerging the package or you might end up
with 4.4 or some other slot, and that won't work with the Postgres
docs as explained at
http://www.postgresql.org/docs/current/static/docguide-toolsets.html.
For example you should be able to get the whole toolset with

emerge -av app-text/openjade \
 app-text/docbook-sgml \
 app-text/docbook-sgml-utils \
 app-text/docbook-dsssl-stylesheets \
 app-text/docbook-sgml-dtd:4.2

I set this up a long time ago, so I'm unsure whether the docbook-sgml
and docbook-sgml-utils packages are genuinely required, but I *am*
able to build the docs with this configuration.

I usually build the HTML target and then view the docs in my browser.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Robert Haas
On Thu, Jul 16, 2009 at 8:07 PM, Brendan Jurd wrote:
> 2009/7/17 Josh Berkus :
>> This seems like a serious issue for development.  Reviewers, how many of you
>> are able to build docs with each patch?
>
> Being able to build docs did require some fidgeting with the docbook
> packages (on Gentoo).  The only trick was working out exactly which
> packages I needed to install.  Since getting past that, I've not had
> any problems building the docs.  Although it is pretty slow.
>
> As Merlin and Andrew have noted, being able to build the docs is a
> nice-to-have for documentation review, not a genuine requirement.  You
> *can* review changes to SGML right there in the diff.  Especially if
> the changes are not extensive and/or don't alter the structure of the
> document.

Yeah.  I usually build the docs and read them if I'm making er
proposing... an extensive change, but for simple stuff I just edit the
SGML and figure that if it looks sane it probably is.

I certainly don't test the doc portions of patches I review unless I
see something sketchy in the markup.

But I can't say I've ever had much trouble building the docs.  I find
it a bit odd that "make" in the doc directory does nothing; and "make"
in doc/src does nothing, but "make" in doc/src/sgml does what you
expect.  I also find the slowness of openjade to be pretty annoying.
But those are minor warts, not serious inconveniences that hinder
reviewing.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-16 Thread Jeff Davis
On Fri, 2009-07-17 at 09:51 +1000, Brendan Jurd wrote:
> I like that idea ... although how would this interact (if at all) with
> the existing pg_index.isunique flag?  Would it become deprecated in
> favour of using indconstrats, or would you actually look at switching
> isunique to TRUE if somebody applies a constraint which is made up
> entirely of equality ops?

If this ALTER TABLE ADD UNIQUE ... USING syntax is really shorthand for
my special index constraints, it would probably have to use the general
mechanism. Otherwise there would be no way to use the general mechanism
over a btree, which I think should be possible (if nothing else it would
be good to allow apples-to-apples performance testing of my patch).

But I guess it doesn't have to be directly shorthand, ALTER TABLE ADD
UNIQUE ... USING could choose to turn an existing index unique when
possible (e.g. btree), otherwise use the general mechanism.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] [v8.5] Security checks on largeobjects

2009-07-16 Thread KaiGai Kohei
Robert Haas wrote:
> 2009/7/16 KaiGai Kohei :
>> However, I don't think the initial proposal of the largeobject
>> security is now on the state to be reviewed seriously.
> 
> OK, I am moving this patch to returned with feedback.

If possible, I would like to have a discussion to make consensus
about its design and interfaces before submitting my patch to
the next commit fest.

I'll submit its design proposal (including a few issues) again.
Is there anyone interested in?

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Brendan Jurd
2009/7/17 Josh Berkus :
> This seems like a serious issue for development.  Reviewers, how many of you
> are able to build docs with each patch?

Being able to build docs did require some fidgeting with the docbook
packages (on Gentoo).  The only trick was working out exactly which
packages I needed to install.  Since getting past that, I've not had
any problems building the docs.  Although it is pretty slow.

As Merlin and Andrew have noted, being able to build the docs is a
nice-to-have for documentation review, not a genuine requirement.  You
*can* review changes to SGML right there in the diff.  Especially if
the changes are not extensive and/or don't alter the structure of the
document.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-16 Thread Brendan Jurd
2009/7/17 Jeff Davis :
> Another idea that I thought about is that:
>
>   ALTER TABLE foo ADD UNIQUE (a, b) USING foo_idx;
>
> could be a shorthand for:
>
>   ALTER TABLE foo ADD INDEX CONSTRAINT (a =, b =) USING foo_idx;
>
> The benefit is that it could go over GiST indexes or hash indexes, not
> just btrees. The syntax could also be useful to turn an existing btree
> into a unique btree.

I like that idea ... although how would this interact (if at all) with
the existing pg_index.isunique flag?  Would it become deprecated in
favour of using indconstrats, or would you actually look at switching
isunique to TRUE if somebody applies a constraint which is made up
entirely of equality ops?

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Andrew Dunstan



Tom Lane wrote:

Greg Smith  writes:
  

On Thu, 16 Jul 2009, Josh Berkus wrote:

Well, after an hour of tinkering with docbook DTDs and openjade I've given up 
on building docs for the patch I was reviewing on my Mac.
  


  
It's easier to get the whole chain working under Linux, but even that 
isn't trivial.



Really?  It's "just worked" for me on the last several Fedora releases.
You do need to install the docbook packages of course ... 



  


Yes, that's my experience also.

In any case, you really don't need to build the docs to read them. You 
might not like SGML, but it's not *that* hard to understand. Surely our 
patch reviewers can read the SGML text.


Of course, we should check that the docs build cleanly after the patch 
is applied, but that's a different issue. As far as building goes, the 
CVS HEAD docs at 
 are rebuilt 
frequently, so we actually check as soon as the patch is applied.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Tom Lane
Greg Smith  writes:
> On Thu, 16 Jul 2009, Josh Berkus wrote:
>> Well, after an hour of tinkering with docbook DTDs and openjade I've given 
>> up 
>> on building docs for the patch I was reviewing on my Mac.

> It's easier to get the whole chain working under Linux, but even that 
> isn't trivial.

Really?  It's "just worked" for me on the last several Fedora releases.
You do need to install the docbook packages of course ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Status report: getting plpgsql to use the core lexer

2009-07-16 Thread Tom Lane
"Kevin Grittner"  writes:
> ...
> We were able to get to much cleaner code by rewriting the parser to
> have a "dumb" phase to get the overall structure into an AST, and then
> use a tree-walker phase to do all the lookups and type resolution
> after we had the rough structure, writing another AST to walk for code
> generation.  Besides making the code cleaner and easier to maintain,
> it helped us give better error messages pointing more accurately to
> the source of the problem.  I don't know if a similar approach is
> feasible in flex/bison, but if it is, refactoring for an extra pass
> might be worth the trouble.

That's actually what we have in the core parser.  plpgsql is trying to
take shortcuts, and this whole project is exactly about weaning it away
from that.  The bottom line is I tried to tackle the sub-projects in the
wrong order...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Status report: getting plpgsql to use the core lexer

2009-07-16 Thread Kevin Grittner
Tom Lane  wrote: 
 
> One problem that wasn't obvious when I started is that if you are
> trying to use a reentrant lexer, Bison insists on including its
> YYSTYPE union in the call signature of the lexer.  Of course,
> YYSTYPE means different things to the core grammar and plpgsql's
> grammar.  I tried to work around that by having an interface layer
> that would (among other duties) translate as needed.  It turned out
> to be a real PITA, not least because you can't include both
> definitions into the same C file.  The scheme I have has more or
> less failed --- I think I'd need *two* interface layers to make it
> work without unmaintainable kluges.  It would probably be better to
> try to adjust the core lexer's API some more so that it does not
> depend on the core YYSTYPE, but I'm not sure yet how to get Bison to
> play along without injecting an interface layer (and hence wasted
> cycles) into the core grammar/lexer interface.
> 
> Another pretty serious issue is that the current plpgsql lexer
> treats various sorts of qualified names as single tokens.  I had
> thought this could be worked around in the interface layer by doing
> more lookahead.  You can do that, and it mostly works, but it's
> mighty tedious.  The big problem is that "yytext" gets out of step
> --- it will point at the last token the core lexer has processed,
> and there's no good way to back it up after lookahead.  I spent a
> fair amount of time trying to work around that by eliminating uses
> of "yytext" in plpgsql, and mostly succeeded, but there are still
> some left.  (Some of the remaining regression failures are error
> messages that point at the wrong token because they rely on yytext.)
> 
> Now, having name lookup happen at the lexical level is pretty bogus
> anyhow.  The long-term solution here is probably to avoid doing
> lookup in the plpgsql lexer and move it into some sort of callback
> hook in the main parser, as we've discussed before.  I didn't want
> to get into that right away, but I'm now thinking it has to happen
> before not after refactoring the lexer code.  One issue that has to
> be surmounted before that can happen is that plpgsql currently
> throws away all knowledge of syntactic scope after initial
> processing of a function --- the "name stack" is no longer available
> when we want to parse individual SQL commands.  We can probably
> rearrange that design but it's another bit of work I don't have time
> for right now.
 
All of this sounds pretty familiar to me.  As you may recall, our
framework includes a SQL parser which parses the subset of standard
SQL we feel is portable enough, and generates Java classes to
implement the code in "lowest common denominator" SQL with all
procedural code for triggers and stored procedures handled in Java
(which runs in our middle tier database service).  We use ANTLR, and
initially had a three-phase process: lexer, parser, and tree-walkers
to generate code.  We were doing way too much in the parser phase --
checking for table names, column names, data types, etc.  The syntax
of SQL forced us to do a lot of scanning forward and remembering where
we were (especially to get the FROM clause information so we could
process columns in the result list).
 
We were able to get to much cleaner code by rewriting the parser to
have a "dumb" phase to get the overall structure into an AST, and then
use a tree-walker phase to do all the lookups and type resolution
after we had the rough structure, writing another AST to walk for code
generation.  Besides making the code cleaner and easier to maintain,
it helped us give better error messages pointing more accurately to
the source of the problem.  I don't know if a similar approach is
feasible in flex/bison, but if it is, refactoring for an extra pass
might be worth the trouble.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Dann Corbit
> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
> ow...@postgresql.org] On Behalf Of Bernd Helmle
> Sent: Thursday, July 16, 2009 8:47 AM
> To: Grzegorz Jaskiewicz
> Cc: pgsql-hackers Hackers
> Subject: Re: [HACKERS] boolean in C
> 
> --On 16. Juli 2009 13:32:03 +0100 Grzegorz Jaskiewicz
> 
> wrote:
> 
> > oh, another thing.
> > stdbool is C99 standard feature. Not gcc extension.
> 
> There might be compiler versions out there which claims to be C99 but
> do
> not provide full compliant include headers. SUN Studio 12 at least has
> the
> following in its documentation, as a quick research brings up:
> 
> "Though the compiler defaults to supporting the features of C99 listed
> below, standard headers provided by the Solaris software in
> /usr/include do
> not yet conform with the 1999 ISO/IEC C standard"

It's more or less a generic problem.  There is only a handful of fully
functional C99 compilers[0], and all the others have "Some c99 features"
to one degree or another.

Microsoft's compiler is particularly abysmal, but then again, they have
no claims of C99 compliance so there is nothing to complain about there.
Those few features that they do implement are implemented in a
non-standard way.

GCC is also only partially compliant[1].  

I believe that the Dinkum library is the only certified C99 standard
library[2] as well.

[0] see: http://www.peren.com/pages/branding_set.htm 
[1] see: http://gcc.gnu.org/c99status.html
[2] see: http://www.dinkumware.com/manuals/

I don't think that a product (that is expected to run on as many
platforms as PostgreSQL is expected to run on) is even possible to write
in C99 code because there are not enough compliant compilers available.

IMO-YMMV


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Bernd Helmle



--On 16. Juli 2009 13:32:03 +0100 Grzegorz Jaskiewicz  
wrote:



oh, another thing.
stdbool is C99 standard feature. Not gcc extension.


There might be compiler versions out there which claims to be C99 but do 
not provide full compliant include headers. SUN Studio 12 at least has the 
following in its documentation, as a quick research brings up:


"Though the compiler defaults to supporting the features of C99 listed 
below, standard headers provided by the Solaris software in /usr/include do 
not yet conform with the 1999 ISO/IEC C standard"



--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make planning via GEQO repeatable

2009-07-16 Thread Andres Freund
On Thursday 16 July 2009 23:04:58 Tom Lane wrote:
> Andres Freund  writes:
> > Query planning via GEQO currently can yield a different plan on every
> > invokation of the planner due to its non-exhaustive nature.
> > This often can be inconvenient because at times there may be a very
> > bad plan. It also makes it very hard to reproduce a problem with GEQO.
>
> Applied with some editorialization.  Mainly, I didn't see the point of
> preserving the ability to have nondeterministic planning, and I
> especially didn't care for having that still be the default behavior.
> So I just made it unconditionally initialize the seed.  It would of
> course take only minor tweaking to do things differently.
Nice. 
Mainly I did not have the guts to change the behaviour completely...

archive.org has a copy of the dead link to the comp.ai.genetic FAQ linked at
http://web.archive.org/web/20051226001402/http://www.cs.bham.ac.uk/Mirrors/ftp.de.uu.net/EC/clife/www/location.htm
which is the same as the one referenced by alvaro in
http://archives.postgresql.org/pgsql-docs/2009-07/msg4.php

If considerered relevant enough, you can update the link...


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: support for multiplexing SIGUSR1

2009-07-16 Thread Jaime Casanova
On Thu, Jul 16, 2009 at 2:57 AM, Jaime
Casanova wrote:
> Hi,
>
> I'm reviewing this patch:
> http://archives.postgresql.org/message-id/3f0b79eb0907022341m1d36a841x19c3e2a5a6906...@mail.gmail.com

Another thing that took my attention, i don't think this is safe (it
assumes only one auxiliary process of any type, don't know if we have
various of the same kind but...):

+   /*
+* Assign backend ID to auxiliary processes like backends, in order to
+* allow multiplexing signal to auxiliary processes. Since backends use
+* ID in the range from 1 to MaxBackends (inclusive), we assign
+* auxiliary processes with MaxBackends + AuxProcType + 1 as
an unique ID.
+*/
+   MyBackendId = MaxBackends + auxType + 1;
+   MyProc->backendId = MyBackendId;


-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Euler Taveira de Oliveira
Merlin Moncure escreveu:
> Isn't it possible though to write and/or review the documentation
> patch without building it?
> 
cd pgsql/doc/src/sgml && gmake check


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make planning via GEQO repeatable

2009-07-16 Thread Tom Lane
Andres Freund  writes:
> Query planning via GEQO currently can yield a different plan on every
> invokation of the planner due to its non-exhaustive nature.
> This often can be inconvenient because at times there may be a very
> bad plan. It also makes it very hard to reproduce a problem with GEQO.

Applied with some editorialization.  Mainly, I didn't see the point of
preserving the ability to have nondeterministic planning, and I
especially didn't care for having that still be the default behavior.
So I just made it unconditionally initialize the seed.  It would of
course take only minor tweaking to do things differently.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY WITH CSV FORCE QUOTE *

2009-07-16 Thread Andrew Dunstan



Chris Spotts wrote:


As for importing data from programs that produce all values in quotes
including null/missing values (your pro case above), arguably what we
need is another flag that would turn an empty string into a null.


h, TODO, please?  There's a lot of this out there, and I've had 
to build sed into a lot of import routines.



+1 For that on the TODO, happens all the time...



Well, somebody had better suggest a syntax for it, preferably without 
adding yet another keyword.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Greg Smith

On Thu, 16 Jul 2009, Josh Berkus wrote:

Well, after an hour of tinkering with docbook DTDs and openjade I've given up 
on building docs for the patch I was reviewing on my Mac.


It's easier to get the whole chain working under Linux, but even that 
isn't trivial.  I think one useful step here would be to write up some 
practical docs on the package setup side here for various popular 
platforms on the wiki.  I can probably find where I have the RedHat and 
Ubuntu recipies I use around here somewhere, to kick that off as part of 
the review I'm doing for the multi-threaded pgbench.  It's been my 
experience that everybody runs into pretty much the same problems here 
getting standard, but said problems are unique to the OS.


If someone write up something similar for OS X, so there's a recipe for 
getting the standard docs built on all the major development platforms 
where this could be straightforward (I shudder to think what a Cygwin 
guide would look like), that would make it much easier to push toward 
having more people do doc review.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for TODO Item: Add prompt escape to display the client and server versions

2009-07-16 Thread Peter Eisentraut
On Thursday 07 May 2009 05:23:41 Dickson S. Guedes wrote:
> This is a WIP patch (for the TODO item in the subject) that I'm putting
> in the Commit Fest queue for 8.5.

The problem I'm seeing with this is that currently it resolves

%v (client) = 8.5devel
%V (server) = 8.5.0

Besides being inconsistent, it's also pretty misleading.  I'd imagine a 
significant use of this feature would be to know that one is connected to, 
say, a beta version.

More generally, does anyone actually need this feature?  psql complains loudly 
enough if the version numbers are not the right ones.  I don't know why this 
would need to be repeated in the prompt.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY WITH CSV FORCE QUOTE *

2009-07-16 Thread Chris Spotts

Josh Berkus wrote:

Andrew,


AFAICT on a brief look at the patch, it doesn't affect the quoting of
nulls on export, it just allows * as an alias for all columns for FORCE
QUOTE (as well as FORCE NOT NULL). But FORCE QUOTE has never forced
quoting of null values, only non-null values. We have never quoted null
values, and I'm fairly resistant to any suggestion that we should.


See?  That's what happens when I can't build the docs.  ;-)  (and 
there's no previous discussion of the feature).




As for importing data from programs that produce all values in quotes
including null/missing values (your pro case above), arguably what we
need is another flag that would turn an empty string into a null.


h, TODO, please?  There's a lot of this out there, and I've had to 
build sed into a lot of import routines.



+1 For that on the TODO, happens all the time...

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Merlin Moncure
On Thu, Jul 16, 2009 at 2:34 PM, Josh Berkus wrote:
> All,
>
> Well, after an hour of tinkering with docbook DTDs and openjade I've given
> up on building docs for the patch I was reviewing on my Mac.
>
> If I'm encountering this difficulty building docs, so are many of the other
> new patch reviewers.  Which means we're *not* reviewing docs for
> completeness, correctness, or correspondence to the actual feature syntax
> until beta time.
>
> This seems like a serious issue for development.  Reviewers, how many of you
> are able to build docs with each patch?

Isn't it possible though to write and/or review the documentation
patch without building it?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY WITH CSV FORCE QUOTE *

2009-07-16 Thread Josh Berkus

Andrew,


AFAICT on a brief look at the patch, it doesn't affect the quoting of
nulls on export, it just allows * as an alias for all columns for FORCE
QUOTE (as well as FORCE NOT NULL). But FORCE QUOTE has never forced
quoting of null values, only non-null values. We have never quoted null
values, and I'm fairly resistant to any suggestion that we should.


See?  That's what happens when I can't build the docs.  ;-)  (and 
there's no previous discussion of the feature).




As for importing data from programs that produce all values in quotes
including null/missing values (your pro case above), arguably what we
need is another flag that would turn an empty string into a null.


h, TODO, please?  There's a lot of this out there, and I've had to 
build sed into a lot of import routines.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Kenneth Marshall
On Thu, Jul 16, 2009 at 06:49:08PM +0200, Andres Freund wrote:
> On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
> > Andres Freund  writes:
> > > The default settings currently make it relatively hard to trigger geqo at
> > > all.
> >
> > Yes, and that was intentional.  One of the implications of what we're
> > discussing here is that geqo would get used a lot more for "typical
> > complex queries" (if there is any such thing as a typical one).  So
> > it's fully to be expected that the fallout would be pressure to improve
> > geqo in various ways.
> >
> > Given that we are at the start of the development cycle, that prospect
> > doesn't scare me --- there's plenty of time to fix whatever needs
> > fixing.  However, I am leaning to the feeling that I don't want to be
> > putting people in a position where they have no alternative but to use
> > geqo.  So adjusting rather than removing the collapse limits is seeming
> > like a good idea.
> Hm. I see a, a bit more fundamental problem with geqo:
> I tried several queries, and I found not a single one, where the whole 
> genetical process did any significant improvments to the 'worth'.
> It seems that always the best variant out of the pool is either the path 
> choosen in the end, or at least the cost difference is _really_ low.
> 
> 
> Andres
> 

Hi Andres,

>From some of my reading of the literature on join order
optimization via random sampling, such as what would establish
the initial GEQO pool, there is a very good possibility of having
a "pretty good" plan in the first pool, especially for our larger
initial pool sizes of 100-1000. And in fact, the final plan has
a good chance of being of approximately the same cost as a member
of the initial pool. Uniform sampling alone can give you a close
to optimum plan 80% of the time with an initial sample size of
100. And using biased sampling raises that to 99% or better.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_activity.application_name

2009-07-16 Thread Heikki Linnakangas
Kevin Grittner wrote:
> We would probably want to modify psql, pg_dump, etc. to put the
> application name into this connection property, at least by default. 
> We may want to add a command-line switch to allow user override -- to
> provide something more detailed.  For example,
> --application-name='Weekly Purge' could by specified on the psql
> command line.

I've seen it set to argv[0] on other DBMSs, if not set explicitly. That
would be a reasonable default, and would handle psql, pg_dump etc.
without changes.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY WITH CSV FORCE QUOTE * -- REVIEW

2009-07-16 Thread Josh Berkus

On 7/16/09 12:53 PM, Robert Haas wrote:

On Thu, Jul 16, 2009 at 2:47 PM, Josh Berkus  wrote:

Unless there are other things we want to test (CLOBs?) I think the patch is
probably ready for code review of the FORCE QUOTE * portion.


I think perhaps we should ask the patch author to remove the NOT NULL
stuff first?


Yes, current status is "Waiting on Author".

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_activity.application_name

2009-07-16 Thread Kevin Grittner
Greg Stark  wrote:
> Kevin Grittner wrote:
>> On the admin list there was a request for an application name
>> column in pg_stat_activity.
>>
>> http://archives.postgresql.org/pgsql-admin/2009-07/msg00095.php
>>
>> This is available in a lot of other DBMS products, can be useful to
>> DBAs, and seems pretty cheap and easy.  Could we get that onto the
>> TODO list?
> 
> I think you should just add it.
 
Done.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY WITH CSV FORCE QUOTE * -- REVIEW

2009-07-16 Thread Robert Haas
On Thu, Jul 16, 2009 at 2:47 PM, Josh Berkus wrote:
> Unless there are other things we want to test (CLOBs?) I think the patch is
> probably ready for code review of the FORCE QUOTE * portion.

I think perhaps we should ask the patch author to remove the NOT NULL
stuff first?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] navigation menu for documents

2009-07-16 Thread Andrew Dunstan



David E. Wheeler wrote:

On Jul 14, 2009, at 3:21 PM, Andrew Dunstan wrote:

Yes, really. What you suggest here is just not adequate, IMNSHO. I 
don't want to have to scroll to the top or bottom of the page to get 
navigation, and I want to be able to see the navigation and go where 
I want directly.


Hey Andrew,

Check out what I've done for the Bricolage documentation:

  http://www.bricolagecms.org/docs/devel/api/

I'm using jQuery to pull the proper doc into a div. I'm still noodling 
with it, trying to fix encoding issues on Windows, but it's pretty 
close to done.





Yes, that's nice, it's just the sort of thing I had in mind - if you can 
do it with a div instead of frames I'm fine with that.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch Rep for CommitFest 2009-07

2009-07-16 Thread Robert Haas
On Thu, Jul 16, 2009 at 1:09 PM, Greg Stark wrote:
> On Thu, Jul 16, 2009 at 4:41 PM, Heikki
> Linnakangas wrote:
>> Rick Gigger wrote:
>>> If you use an rsync like algorithm for doing the base backups wouldn't
>>> that increase the size of the database for which it would still be
>>> practical to just re-sync?  Couldn't you in fact sync a very large
>>> database if the amount of actual change in the files was a small
>>> percentage of the total size?
>>
>> It would certainly help to reduce the network traffic, though you'd
>> still have to scan all the data to see what has changed.
>
> The fundamental problem with pushing users to start over with a new
> base backup is that there's no relationship between the size of the
> WAL and the size of the database.
>
> You can plausibly have a system with extremely high transaction rate
> generating WAL very quickly, but where the whole database fits in a
> few hundred megabytes. In that case you could be behind by only a few
> minutes and have it be faster to take a new base backup.
>
> Or you could have a petabyte database which is rarely updated. In
> which case it might be faster to apply weeks' worth of logs than to
> try to take a base backup.
>
> Only the sysadmin is actually going to know which makes more sense.
> Unless we start tieing WAL parameters to the database size or
> something like that.

I think we need a way for the master to know who its slaves are and
keep any given bit of WAL available until all slaves have succesfully
read it, just as we keep each WAL file until we successfully copy it
to the archive.  Otherwise, there's no way to be sure that a
connection break won't result in the need for a new base backup.  (In
a way, a slave is very similar to an additional archive.)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_activity.application_name

2009-07-16 Thread Greg Stark
On Thu, Jul 16, 2009 at 8:08 PM, Kevin
Grittner wrote:
> On the admin list there was a request for an application name
> column in pg_stat_activity.
>
> http://archives.postgresql.org/pgsql-admin/2009-07/msg00095.php
>
> This is available in a lot of other DBMS products, can be useful to
> DBAs, and seems pretty cheap and easy.  Could we get that onto the
> TODO list?

I think you should just add it.

Ok, we probably need some kind of policy for what to do before "just"
adding things to the TODO but I think it should be relatively liberal.
Something like, you should post that you're going to add it to the
-hackers list, get at least one person agreeing with the item and no
fatal flaws. Oh, and you should check for duplicates or for the same
item on the "things we don't want" list. But if having done that you
should assume it's up to you to just go ahead and add it.

In this case I don't see any harm in having an opaque application
identifier. Dangers (but surmountable ones I assume) would be:

1) The authenticity of the application identifier needs to be
downplayed -- don't even think of using it for security for example.

2) encoding issues if different connections are in different encodings...

3) backwards compatibility both in the library api and protocol

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_activity.application_name

2009-07-16 Thread Kevin Grittner
Jaime Casanova  wrote: 
> Kevin Grittner wrote:
>> On the admin list there was a request for an application name
>> column in pg_stat_activity.
 
> ah? how do you implement that? and what's the use case for?
 
It would be passed as a connection property.  (If that's not feasible,
perhaps a session GUC, which could map to a client-side connection
property is JDBC, etc.)  For many environments, it is at least as
important as an IP address or PID to help someone managing a database
with a lot of connections.  It would not do anything on the server
except show up in pg_stat_activity as another piece of information
about each connection.
 
We would probably want to modify psql, pg_dump, etc. to put the
application name into this connection property, at least by default. 
We may want to add a command-line switch to allow user override -- to
provide something more detailed.  For example,
--application-name='Weekly Purge' could by specified on the psql
command line.
 
Yes, this is only as useful as the compliance of the client
applications make it.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY WITH CSV FORCE QUOTE *

2009-07-16 Thread Andrew Dunstan



Josh Berkus wrote:

Andrew,


FORCE NOT NULL is in any case a fairly blunt instrument - it doesn't
work for a column of any type that doesn't accept an empty string as
valid input, such as numeric types.


Con: this allows COPY to produce output which cannot be reloaded into 
PostgreSQL.


Pro: there is a lot of extremely broken external software which 
expects "nulls" to be expressed as "".  This improves compatiblity 
with them.




FORCE NOT NULL is only valid when we import data, not when we export 
data, so what other programs expect to receive is irrelevant to any 
argument about FORCE NOT NULL.


AFAICT on a brief look at the patch, it doesn't affect the quoting of 
nulls on export, it just allows * as an alias for all columns for FORCE 
QUOTE (as well as FORCE NOT NULL). But FORCE QUOTE has never forced 
quoting of null values, only non-null values. We have never quoted null 
values, and I'm fairly resistant to any suggestion that we should.


As for importing data from programs that produce all values in quotes 
including null/missing values (your pro case above), arguably what we 
need is another flag that would turn an empty string into a null.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch Rep for CommitFest 2009-07

2009-07-16 Thread Rick Gigger

On Jul 16, 2009, at 11:09 AM, Greg Stark wrote:


On Thu, Jul 16, 2009 at 4:41 PM, Heikki
Linnakangas wrote:

Rick Gigger wrote:
If you use an rsync like algorithm for doing the base backups  
wouldn't

that increase the size of the database for which it would still be
practical to just re-sync?  Couldn't you in fact sync a very large
database if the amount of actual change in the files was a small
percentage of the total size?


It would certainly help to reduce the network traffic, though you'd
still have to scan all the data to see what has changed.


The fundamental problem with pushing users to start over with a new
base backup is that there's no relationship between the size of the
WAL and the size of the database.

You can plausibly have a system with extremely high transaction rate
generating WAL very quickly, but where the whole database fits in a
few hundred megabytes. In that case you could be behind by only a few
minutes and have it be faster to take a new base backup.

Or you could have a petabyte database which is rarely updated. In
which case it might be faster to apply weeks' worth of logs than to
try to take a base backup.

Only the sysadmin is actually going to know which makes more sense.
Unless we start tieing WAL parameters to the database size or
something like that.


Once again wouldn't an rsync like algorithm help here.  Couldn't you  
have the default be to just create a new base backup for them , but  
then allow you to specify an existing base backup if you've already  
got one?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] navigation menu for documents

2009-07-16 Thread David E. Wheeler

On Jul 14, 2009, at 3:21 PM, Andrew Dunstan wrote:

Yes, really. What you suggest here is just not adequate, IMNSHO. I  
don't want to have to scroll to the top or bottom of the page to get  
navigation, and I want to be able to see the navigation and go where  
I want directly.


Hey Andrew,

Check out what I've done for the Bricolage documentation:

  http://www.bricolagecms.org/docs/devel/api/

I'm using jQuery to pull the proper doc into a div. I'm still noodling  
with it, trying to fix encoding issues on Windows, but it's pretty  
close to done.


Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_activity.application_name

2009-07-16 Thread Jaime Casanova
On Thu, Jul 16, 2009 at 2:08 PM, Kevin
Grittner wrote:
> On the admin list there was a request for an application name
> column in pg_stat_activity.
>
> http://archives.postgresql.org/pgsql-admin/2009-07/msg00095.php
>
> This is available in a lot of other DBMS products, can be useful to
> DBAs, and seems pretty cheap and easy.

ah? how do you implement that? and what's the use case for?

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_stat_activity.application_name

2009-07-16 Thread Kevin Grittner
On the admin list there was a request for an application name
column in pg_stat_activity.
 
http://archives.postgresql.org/pgsql-admin/2009-07/msg00095.php
 
This is available in a lot of other DBMS products, can be useful to
DBAs, and seems pretty cheap and easy.  Could we get that onto the
TODO list?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY WITH CSV FORCE QUOTE *

2009-07-16 Thread Josh Berkus

Andrew,


FORCE NOT NULL is in any case a fairly blunt instrument - it doesn't
work for a column of any type that doesn't accept an empty string as
valid input, such as numeric types.


Con: this allows COPY to produce output which cannot be reloaded into 
PostgreSQL.


Pro: there is a lot of extremely broken external software which expects 
"nulls" to be expressed as "".  This improves compatiblity with them.



--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_migrator not setting values of sequences?

2009-07-16 Thread Bruce Momjian
Peter Eisentraut wrote:
> On Thursday 16 July 2009 07:09:22 Bruce Momjian wrote:
> > Uh, how is this going to behave in 8.5?  Do we still dump sequences, and
> > if so, aren't we heading down the road of dumping stuff only because a
> > previous release needed it?
> 
> Which leads me to a related question:  Do you plan to maintain one version of 
> pg_migrator that can upgrade any version to any other version (within 
> reason), 
> or will there be separate binaries, say pg_migrator-8.4 and pg_migrator-8.5, 
> that each can only upgrade from $selfversion-1 to $selfversion?

One binary/source tree.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY WITH CSV FORCE QUOTE * -- REVIEW

2009-07-16 Thread Josh Berkus

All,

1) Patch applies cleanly against CVS head.

2) Patch compiles and builds cleanly.

3) Unable to check docs because of general doc build problems.

4) Tested the following commands, using a 10MB table of PostgreSQL log data:

postgres=# COPY marchlog TO '/tmp/marchlog1.csv' with csv header;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog2.csv' with csv header force 
quote *;

COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog3.csv' with csv header force 
quote process_id;

COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog4.csv' with csv force quote *;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog5.csv' with force quote *;
ERROR:  COPY force quote available only in CSV mode
STATEMENT:  COPY marchlog TO '/tmp/marchlog5.csv' with force quote *;
ERROR:  COPY force quote available only in CSV mode

postgres=# COPY reloadlog FROM '/tmp/marchlog2.csv' with csv header;
COPY 81097

postgres-# \copy marchlog TO '/tmp/marchlog5.csv' with csv force quote *;
postgres-#

Per discussion, I did not test FORCE QUOTE NOT NULL *.

All output looked as expected.  This patch did not seem to change 
eariler functionality, and seems to quote as specified.


Unless there are other things we want to test (CLOBs?) I think the patch 
is probably ready for code review of the FORCE QUOTE * portion.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Docbook toolchain interfering with patch review?

2009-07-16 Thread Josh Berkus

All,

Well, after an hour of tinkering with docbook DTDs and openjade I've 
given up on building docs for the patch I was reviewing on my Mac.


If I'm encountering this difficulty building docs, so are many of the 
other new patch reviewers.  Which means we're *not* reviewing docs for 
completeness, correctness, or correspondence to the actual feature 
syntax until beta time.


This seems like a serious issue for development.  Reviewers, how many of 
you are able to build docs with each patch?


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] plpythonu datatype conversion improvements

2009-07-16 Thread Caleb Welton
Sorry about that.  Here it is again as an attachment.

-Caleb


On 7/16/09 7:16 AM, "Peter Eisentraut"  wrote:

On Wednesday 27 May 2009 02:07:33 Caleb Welton wrote:
> Patch for plpythonu

This patch doesn't apply; I think it got mangled during email transport.
(Tabs changed to spaces, it looks like.)  Could you resend the patch as a
separate attachment in a way that it doesn't get mangled?




plpython_bytea.patch
Description: plpython_bytea.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] [v8.5] Security checks on largeobjects

2009-07-16 Thread Robert Haas
2009/7/16 KaiGai Kohei :
> However, I don't think the initial proposal of the largeobject
> security is now on the state to be reviewed seriously.

OK, I am moving this patch to returned with feedback.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Andres Freund
On Thursday 16 July 2009 19:22:30 Robert Haas wrote:
> On Thu, Jul 16, 2009 at 11:32 AM, Tom Lane wrote:
> > I wrote:
> >> If I set both collapse_limit variables to very high values (I used 999),
> >> it takes ... um ... not sure; I gave up waiting after half an hour.
> >> I also tried with geqo_effort reduced to the minimum of 1, but that
> >> didn't produce a plan in reasonable time either (I gave up after ten
> >> minutes).
> >
> > After I gave up letting the machine be idle to get a fair timing,
> > I turned on oprofile monitoring.  It looks a bit interesting:
> That is interesting, but there's not really enough detail here to see
> what is going on.  I'm more interested in what the high-level
> functions are doing that's causing these guys to be called so many
> times.  As Greg says, if the planning time curve for GEQO isn't better
> than the one for the standard planner, it's the epitome of pointless.
It is not the actual genetic searching I now found out (or more precisely, 
read the trace correctly).

At the start of the query GEQO fills a pool with random paths through the 
searchspace. Unfortunately a random path is not very likely to succeed. So it 
checks and checks and...

Thats why that problem is not visible with a simple join out of 100 or so 
tables - all paths are valid there...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jul 16, 2009 at 11:32 AM, Tom Lane wrote:
>> So maybe a redesign of the equivalence-class joinclause mechanism is in
>> order.  Still, this is unlikely to fix the fundamental issue that the
>> time for large join problems grows nonlinearly.

> Nonlinear is one thing, but this looks more like exponential.  I
> understand that the standard planner is exponential; GEQO should not
> be.

Well, the equivclass code is new as of 8.3.  It's possible that this
got broken relatively recently ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Andres Freund
On Thursday 16 July 2009 19:13:55 Robert Haas wrote:
> On Thu, Jul 16, 2009 at 12:49 PM, Andres Freund wrote:
> > On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
> >> Andres Freund  writes:
> >> > The default settings currently make it relatively hard to trigger geqo
> >> > at all.
> >>
> >> Yes, and that was intentional.  One of the implications of what we're
> >> discussing here is that geqo would get used a lot more for "typical
> >> complex queries" (if there is any such thing as a typical one).  So
> >> it's fully to be expected that the fallout would be pressure to improve
> >> geqo in various ways.
> >>
> >> Given that we are at the start of the development cycle, that prospect
> >> doesn't scare me --- there's plenty of time to fix whatever needs
> >> fixing.  However, I am leaning to the feeling that I don't want to be
> >> putting people in a position where they have no alternative but to use
> >> geqo.  So adjusting rather than removing the collapse limits is seeming
> >> like a good idea.
> >
> > Hm. I see a, a bit more fundamental problem with geqo:
> > I tried several queries, and I found not a single one, where the whole
> > genetical process did any significant improvments to the 'worth'.
> > It seems that always the best variant out of the pool is either the path
> > choosen in the end, or at least the cost difference is _really_ low.
> Ouch.  Did you insert some debugging code to get that information, or
> how did you come to that conclusion?
Yes, I enabled GEQO_DEBUG and added some more debugging output.  

Btw, a higher generation count does not change that.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Robert Haas
On Thu, Jul 16, 2009 at 11:32 AM, Tom Lane wrote:
> I wrote:
>> If I set both collapse_limit variables to very high values (I used 999),
>> it takes ... um ... not sure; I gave up waiting after half an hour.
>> I also tried with geqo_effort reduced to the minimum of 1, but that
>> didn't produce a plan in reasonable time either (I gave up after ten
>> minutes).
>
> After I gave up letting the machine be idle to get a fair timing,
> I turned on oprofile monitoring.  It looks a bit interesting:

That is interesting, but there's not really enough detail here to see
what is going on.  I'm more interested in what the high-level
functions are doing that's causing these guys to be called so many
times.  As Greg says, if the planning time curve for GEQO isn't better
than the one for the standard planner, it's the epitome of pointless.

> So maybe a redesign of the equivalence-class joinclause mechanism is in
> order.  Still, this is unlikely to fix the fundamental issue that the
> time for large join problems grows nonlinearly.

Nonlinear is one thing, but this looks more like exponential.  I
understand that the standard planner is exponential; GEQO should not
be.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-16 Thread Jeff Davis
On Thu, 2009-07-16 at 15:22 +1000, Brendan Jurd wrote:
> I had a play around with the feature in psql.  I think the syntax is
> okay, but using "ALTER TABLE ... ADD" as you mentioned upthread could
> be a better option.

Ok, I think we're pretty much settled on that option then.

Another idea that I thought about is that:

   ALTER TABLE foo ADD UNIQUE (a, b) USING foo_idx;

could be a shorthand for:

   ALTER TABLE foo ADD INDEX CONSTRAINT (a =, b =) USING foo_idx;

The benefit is that it could go over GiST indexes or hash indexes, not
just btrees. The syntax could also be useful to turn an existing btree
into a unique btree.

> I noticed that there's no change to the output of \d in psql to show
> the constraint, so when I do a \d on my test table, I can see that
> there's a gist index there, but I can't tell that there is also a
> constraint on it.  This seems like a pretty significant shortcoming.
> Essentially once you've created one of these index constraints, it
> vanishes into the catalogs and becomes invisible to the user.  This
> might call for a modification of pg_get_indexdef()?

I agree, that's important. Psql support, regression tests, and docs are
all intertwined somewhat with the syntax, so I held off on that work
until I got a little feedback. I will get to work and see if I can put
together a more complete version in the next few days.

If you happen to have time, you can see if you can break my current
patch. I expect the basic algorithm to remain about the same for my next
version, so if you see any problems with that, please let me know. Also,
if you see any possible improvements that could make it useful for more
situations, that would be helpful, too.

But I think I have enough information to move forward, so if you want to
move on to a more complete patch, feel free.

Thanks for the review!

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Robert Haas
On Thu, Jul 16, 2009 at 12:49 PM, Andres Freund wrote:
> On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
>> Andres Freund  writes:
>> > The default settings currently make it relatively hard to trigger geqo at
>> > all.
>>
>> Yes, and that was intentional.  One of the implications of what we're
>> discussing here is that geqo would get used a lot more for "typical
>> complex queries" (if there is any such thing as a typical one).  So
>> it's fully to be expected that the fallout would be pressure to improve
>> geqo in various ways.
>>
>> Given that we are at the start of the development cycle, that prospect
>> doesn't scare me --- there's plenty of time to fix whatever needs
>> fixing.  However, I am leaning to the feeling that I don't want to be
>> putting people in a position where they have no alternative but to use
>> geqo.  So adjusting rather than removing the collapse limits is seeming
>> like a good idea.
> Hm. I see a, a bit more fundamental problem with geqo:
> I tried several queries, and I found not a single one, where the whole
> genetical process did any significant improvments to the 'worth'.
> It seems that always the best variant out of the pool is either the path
> choosen in the end, or at least the cost difference is _really_ low.

Ouch.  Did you insert some debugging code to get that information, or
how did you come to that conclusion?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch Rep for CommitFest 2009-07

2009-07-16 Thread Greg Stark
On Thu, Jul 16, 2009 at 4:41 PM, Heikki
Linnakangas wrote:
> Rick Gigger wrote:
>> If you use an rsync like algorithm for doing the base backups wouldn't
>> that increase the size of the database for which it would still be
>> practical to just re-sync?  Couldn't you in fact sync a very large
>> database if the amount of actual change in the files was a small
>> percentage of the total size?
>
> It would certainly help to reduce the network traffic, though you'd
> still have to scan all the data to see what has changed.

The fundamental problem with pushing users to start over with a new
base backup is that there's no relationship between the size of the
WAL and the size of the database.

You can plausibly have a system with extremely high transaction rate
generating WAL very quickly, but where the whole database fits in a
few hundred megabytes. In that case you could be behind by only a few
minutes and have it be faster to take a new base backup.

Or you could have a petabyte database which is rarely updated. In
which case it might be faster to apply weeks' worth of logs than to
try to take a base backup.

Only the sysadmin is actually going to know which makes more sense.
Unless we start tieing WAL parameters to the database size or
something like that.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Andres Freund
On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
> Andres Freund  writes:
> > The default settings currently make it relatively hard to trigger geqo at
> > all.
>
> Yes, and that was intentional.  One of the implications of what we're
> discussing here is that geqo would get used a lot more for "typical
> complex queries" (if there is any such thing as a typical one).  So
> it's fully to be expected that the fallout would be pressure to improve
> geqo in various ways.
>
> Given that we are at the start of the development cycle, that prospect
> doesn't scare me --- there's plenty of time to fix whatever needs
> fixing.  However, I am leaning to the feeling that I don't want to be
> putting people in a position where they have no alternative but to use
> geqo.  So adjusting rather than removing the collapse limits is seeming
> like a good idea.
Hm. I see a, a bit more fundamental problem with geqo:
I tried several queries, and I found not a single one, where the whole 
genetical process did any significant improvments to the 'worth'.
It seems that always the best variant out of the pool is either the path 
choosen in the end, or at least the cost difference is _really_ low.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Andres Freund
On Thursday 16 July 2009 18:23:06 Tom Lane wrote:
> Andres Freund  writes:
> > On Thursday 16 July 2009 17:16:31 Tom Lane wrote:
> >> I tried the example query and couldn't get "Failed to make a valid plan"
> >> out of it ... what settings do you need for that?
> >
> > It unfortunately depends on settings and luck. This dependence on luck
> > was the reason why I liked geqo to behave "somewhat" deterministically...
> >
> > With  {join,from}_collapse_limit = 100 it seems to be triggered reliably.
> > With lower values it seems harder trigger, with bigger it simply takes
> > too long to even get there.
>
> OK, I see it at 100.  Would you confirm that what you get is the failure
> in random_init_pool (geqo_pool.c) not the identically-phrased message
> elsewhere?  (If you have VERBOSITY = verbose you should see the error
> location info.)
Yes. I should have seen that. Its not exactly surprising...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Tom Lane
Andres Freund  writes:
> On Thursday 16 July 2009 17:16:31 Tom Lane wrote:
>> I tried the example query and couldn't get "Failed to make a valid plan"
>> out of it ... what settings do you need for that?

> It unfortunately depends on settings and luck. This dependence on luck was 
> the 
> reason why I liked geqo to behave "somewhat" deterministically...

> With  {join,from}_collapse_limit = 100 it seems to be triggered reliably. 
> With 
> lower values it seems harder trigger, with bigger it simply takes too long to 
> even get there.

OK, I see it at 100.  Would you confirm that what you get is the failure
in random_init_pool (geqo_pool.c) not the identically-phrased message
elsewhere?  (If you have VERBOSITY = verbose you should see the error
location info.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Pavel Stehule
2009/7/16 Hitoshi Harada :
> 2009/7/16 Greg Stark :
>> On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehule 
>> wrote:
>>> postgres=# select avg(a) from (select a, row_number() over (order by
>>> a) as r, count(*) over () as rc from x ) p where r in
>>> ((rc+1)/2,(rc+2)/2) ;
>>
>> How does this compare to the plain non-windowing SQL implementation:
>>
>> select a from x order by a offset (select trunc(count(*)/2) from x) limit 1
>>
>> (except that that only works if count(*) is odd).
>>
>> Interestingly finding the median is actually O(n) using Quickselect.
>> Maybe we should provide a C implementation of quickselect as a window
>> function. I'm not sure how to wedge in the concept that the sort is
>> unnecessary even though the ORDER BY is specified though.
>
> median() should be aggregate, not window function, shouldn't it?
>
yes - the core of my topic is significant slowness query, that use
window functions, when aggregate function was used. This case could be
simply optimized.

This case isn't important for me. Simply I played with w.f. and I
found Celko's query - and I was surprised, because this query was
faster, then other - I expected some else.

>>
>> I'm also not sure how to handle this if the set has to be spooled to
>> disk. Quicksort and Quickselect do a lot of scans throught he data and
>> wouldn't perform well on disk.
>
> The WindowAgg spools rows into the tuplestore, which holds the data in
> memory as far as it fits in. Do you have any idea how it stores
> millons of millions of rows without tuplestore?
>
> Regards,
>
>
> --
> Hitoshi Harada
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Tom Lane
Andres Freund  writes:
> The default settings currently make it relatively hard to trigger geqo at all.

Yes, and that was intentional.  One of the implications of what we're
discussing here is that geqo would get used a lot more for "typical
complex queries" (if there is any such thing as a typical one).  So
it's fully to be expected that the fallout would be pressure to improve
geqo in various ways.

Given that we are at the start of the development cycle, that prospect
doesn't scare me --- there's plenty of time to fix whatever needs
fixing.  However, I am leaning to the feeling that I don't want to be
putting people in a position where they have no alternative but to use
geqo.  So adjusting rather than removing the collapse limits is seeming
like a good idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Andres Freund
On Thursday 16 July 2009 17:27:39 Greg Stark wrote:
> On Thu, Jul 16, 2009 at 4:16 PM, Tom Lane wrote:
> > However, I do observe that this seems a sufficient counterexample
> > against the theory that we can just remove the collapse limits and let
> > GEQO save us on very complex queries.  On my machine, the example query
> > takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> > If I set both collapse_limit variables to very high values (I used 999),
> > it takes ... um ... not sure; I gave up waiting after half an hour.
> What's the point of GEQO if it doesn't guarantee to produce the
> optimal plana and *also* doesn't guarantee to produce some plan, any
> plan, within some reasonable amount of time? Either we need to fix
> that or else I don't see what it's buying us over our regular planner
> which also might not produce a plan within a reasonable amount of time
> but at least if it does it'll be the right plan.
Well, I could not find a plan where it errored out with the old limits. So one 
could argue its just not adapted.
Although I also could not find a single case where geqo was relevantly faster 
with the default settings even if it was used.
The default settings currently make it relatively hard to trigger geqo at all.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Tom Lane
Greg Stark  writes:
> On Thu, Jul 16, 2009 at 4:32 PM, Tom Lane wrote:
>> So maybe a redesign of the equivalence-class joinclause mechanism is in
>> order.  Still, this is unlikely to fix the fundamental issue that the
>> time for large join problems grows nonlinearly.

> Perhaps it's GEQO's fault that it's using these functions
> inappropriately, calling them often to calculate these answers
> whenever it needs them instead of looking once for join clauses and
> then optimizing based on the results. But I've never actually looked
> at geqo, mabe that's inherent in the design?

geqo isn't doing anything the regular planner wouldn't do under similar
conditions.  It might well be that better caching is the answer to this
particular problem, but I don't have time to look closer today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch Rep for CommitFest 2009-07

2009-07-16 Thread Heikki Linnakangas
Rick Gigger wrote:
> If you use an rsync like algorithm for doing the base backups wouldn't
> that increase the size of the database for which it would still be
> practical to just re-sync?  Couldn't you in fact sync a very large
> database if the amount of actual change in the files was a small
> percentage of the total size?

It would certainly help to reduce the network traffic, though you'd
still have to scan all the data to see what has changed.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Greg Stark
On Thu, Jul 16, 2009 at 4:32 PM, Tom Lane wrote:
> samples  %        image name               symbol name
> 886498   53.8090  postgres                 have_relevant_eclass_joinclause
> 460596   27.9574  postgres                 bms_overlap
>
> So maybe a redesign of the equivalence-class joinclause mechanism is in
> order.  Still, this is unlikely to fix the fundamental issue that the
> time for large join problems grows nonlinearly.

Perhaps it's GEQO's fault that it's using these functions
inappropriately, calling them often to calculate these answers
whenever it needs them instead of looking once for join clauses and
then optimizing based on the results. But I've never actually looked
at geqo, mabe that's inherent in the design?


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Andres Freund
On Thursday 16 July 2009 17:16:31 Tom Lane wrote:
> Andres Freund  writes:
> > On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
> >> Andres Freund  writes:
> >>> "Error: Failed to make a valid plan"
> >>
> >> We're not going to be able to fix this unless you show us examples.
> >
> > In the other thread I attached a similar to the real schema + example
> > query. Not enough? And why?
>
> I tried the example query and couldn't get "Failed to make a valid plan"
> out of it ... what settings do you need for that?
It unfortunately depends on settings and luck. This dependence on luck was the 
reason why I liked geqo to behave "somewhat" deterministically...

With  {join,from}_collapse_limit = 100 it seems to be triggered reliably. With 
lower values it seems harder trigger, with bigger it simply takes too long to 
even get there.

Efficiencywise using geqo with higher limits nearly all time is spent in:

geqo
gimme_tree 
have_join_order_restriction
has_legal_joinclause
have_relevant_joinclause
have_relevant_eclass (30% self)
bms_overlap (50%self)

I am not yet fully understanding geqo, but it looks like there are some 
possibilities to improve this. 
Although such efficiency improvements would no not explain the completely 
failing plans...

Do you have an idea which kind of plans benefit most from using geqo? I had a 
somewhat hard time finding any query were geqo was substantially faster than 
the standard join search.

That also somewhat explains why I saw improvements with 64bit bitmapsets...


> However, I do observe that this seems a sufficient counterexample
> against the theory that we can just remove the collapse limits and let
> GEQO save us on very complex queries.  On my machine, the example query
> takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> If I set both collapse_limit variables to very high values (I used 999),
> it takes ... um ... not sure; I gave up waiting after half an hour.
> I also tried with geqo_effort reduced to the minimum of 1, but that
> didn't produce a plan in reasonable time either (I gave up after ten
> minutes).  So if we remove the collapse limits, Postgres will completely
> fail on this query --- the only way out would be enable_join_ordering =
> off, which is hardly likely to produce a decent plan.
> Maybe we should leave the collapse_limit logic alone and address
> Robert's gripes by just raising the default values a lot (I'm thinking
> 100 or so).  That way there's an escape hatch for anyone who has
> pathological queries to deal with --- just dial the settings down.
Yes, I think thats sensible. I don't know if there are any queries out there 
that benefit from a higher limits.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Tom Lane
Grzegorz Jaskiewicz  writes:
> On 16 Jul 2009, at 15:17, Tom Lane wrote:
>> That's hardly going to improve readability for anyone.  Also, it will
>> flat out not work for the catalog struct declarations.  When we say
>> "bool relhasindex;" the compiler had better think that that's a
>> one-byte field.

> Sure, but I would certainly hope, there's not too many places where  
> you actually convert it from disc representation, to internal and vice  
> versa.

We don't "convert" --- it's expected to be the same representation.
As for not too many of them, I think grepping for references to bool
catalog fields will show you differently ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Kenneth Marshall
On Thu, Jul 16, 2009 at 04:27:39PM +0100, Greg Stark wrote:
> On Thu, Jul 16, 2009 at 4:16 PM, Tom Lane wrote:
> > However, I do observe that this seems a sufficient counterexample
> > against the theory that we can just remove the collapse limits and let
> > GEQO save us on very complex queries. ?On my machine, the example query
> > takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> > If I set both collapse_limit variables to very high values (I used 999),
> > it takes ... um ... not sure; I gave up waiting after half an hour.
> 
> What's the point of GEQO if it doesn't guarantee to produce the
> optimal plana and *also* doesn't guarantee to produce some plan, any
> plan, within some reasonable amount of time? Either we need to fix
> that or else I don't see what it's buying us over our regular planner
> which also might not produce a plan within a reasonable amount of time
> but at least if it does it'll be the right plan.
> 

I do agree that we should have an actually time limit cap for
GEQO that would have it return the best plan so far at that time.
Then you can at least bound your planning time.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Tom Lane
I wrote:
> If I set both collapse_limit variables to very high values (I used 999),
> it takes ... um ... not sure; I gave up waiting after half an hour.
> I also tried with geqo_effort reduced to the minimum of 1, but that
> didn't produce a plan in reasonable time either (I gave up after ten
> minutes).

After I gave up letting the machine be idle to get a fair timing,
I turned on oprofile monitoring.  It looks a bit interesting:

samples  %image name   symbol name
886498   53.8090  postgres have_relevant_eclass_joinclause
460596   27.9574  postgres bms_overlap
1427648.6655  postgres bms_is_subset
1262747.6646  postgres have_join_order_restriction
14205 0.8622  postgres list_nth_cell
2721  0.1652  postgres generate_join_implied_equalities
2445  0.1484  libc-2.9.so  memset
2202  0.1337  postgres have_relevant_joinclause
1678  0.1019  postgres make_canonical_pathkey
1648  0.1000  postgres pfree
884   0.0537  postgres bms_union
762   0.0463  postgres gimme_tree
660   0.0401  libc-2.9.so  memcpy
571   0.0347  postgres AllocSetFree
475   0.0288  postgres AllocSetAlloc
431   0.0262  postgres has_relevant_eclass_joinclause
389   0.0236  postgres check_list_invariants
260   0.0158  postgres join_is_legal
238   0.0144  postgres bms_copy

So maybe a redesign of the equivalence-class joinclause mechanism is in
order.  Still, this is unlikely to fix the fundamental issue that the
time for large join problems grows nonlinearly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Zdenek Kotala

Grzegorz Jaskiewicz píše v čt 16. 07. 2009 v 14:59 +0100:

> >
> >> Why C89, and not C99 ? Virtually all compilers for last 4 years have/
> >> had C99 support.
> >
> > Well, I think we want to run on systems that are older than 4 years,  
> > too.
> 
> 
> Sure, but that's probably less than 1% of all systems.
> The 4 years was a guess, I think its much more than that.

For example Solaris 8 is 9 years old and still is used in production. I
guess HP-UX is in same situation. And so on. I not able to say how many
PostgreSQL runs on them but how Tom mentioned there is no significant
reason to break old platform.

Zdenek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Greg Stark
On Thu, Jul 16, 2009 at 4:16 PM, Tom Lane wrote:
> However, I do observe that this seems a sufficient counterexample
> against the theory that we can just remove the collapse limits and let
> GEQO save us on very complex queries.  On my machine, the example query
> takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> If I set both collapse_limit variables to very high values (I used 999),
> it takes ... um ... not sure; I gave up waiting after half an hour.

What's the point of GEQO if it doesn't guarantee to produce the
optimal plana and *also* doesn't guarantee to produce some plan, any
plan, within some reasonable amount of time? Either we need to fix
that or else I don't see what it's buying us over our regular planner
which also might not produce a plan within a reasonable amount of time
but at least if it does it'll be the right plan.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Tom Lane
Andres Freund  writes:
> On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
>> Andres Freund  writes:
>>> "Error: Failed to make a valid plan"

>> We're not going to be able to fix this unless you show us examples.

> In the other thread I attached a similar to the real schema + example query. 
> Not enough? And why?

I tried the example query and couldn't get "Failed to make a valid plan"
out of it ... what settings do you need for that?

However, I do observe that this seems a sufficient counterexample
against the theory that we can just remove the collapse limits and let
GEQO save us on very complex queries.  On my machine, the example query
takes about 22 seconds to plan using CVS HEAD w/ all default settings.
If I set both collapse_limit variables to very high values (I used 999),
it takes ... um ... not sure; I gave up waiting after half an hour.
I also tried with geqo_effort reduced to the minimum of 1, but that
didn't produce a plan in reasonable time either (I gave up after ten
minutes).  So if we remove the collapse limits, Postgres will completely
fail on this query --- the only way out would be enable_join_ordering =
off, which is hardly likely to produce a decent plan.

Maybe we should leave the collapse_limit logic alone and address
Robert's gripes by just raising the default values a lot (I'm thinking
100 or so).  That way there's an escape hatch for anyone who has
pathological queries to deal with --- just dial the settings down.

>>> Noticeable even some plans which were plannable in reasonable time before
>>> now are problematic with enable_join_ordering=false!

>> And this even more so --- it doesn't make any sense at all.

> Why? With a high from_collapse_limit more subqueries get inlined - before 
> inlining they get planned separately.

Okay, I misunderstood which two cases you were comparing there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Grzegorz Jaskiewicz


On 16 Jul 2009, at 15:17, Tom Lane wrote:


Grzegorz Jaskiewicz  writes:

That's hardly going to improve readability for anyone.  Also, it will
flat out not work for the catalog struct declarations.  When we say
"bool relhasindex;" the compiler had better think that that's a
one-byte field.


Sure, but I would certainly hope, there's not too many places where  
you actually convert it from disc representation, to internal and vice  
versa.





And it is pretty annoying, when your product also has its own BOOLean
defined...


IOW you're not using stdbool either?


Well, saying that I don't is quite an overstatement. It was decided  
long before I started working for that customer, and is full of  
problems like that. But still,
it would be nice for postgresql to at least not cause problems like  
that. Having said that, I will probably fix it on customer's side, but  
I wanted to see if you guys will be happy with patch that changes that  
in postgresql.


thanks .


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mostly Harmless: c++bookends - patch 2 of 4

2009-07-16 Thread Peter Eisentraut
On Thursday 16 July 2009 17:00:03 Robert Haas wrote:
> On Mon, Jul 13, 2009 at 5:51 PM, Peter Eisentraut wrote:
> > So I think either decoration is added to all of these files or none of
> > them. And I think the former is not going to go over well.
>
> We do have some things that are conditioned on __cplusplus already,
> such as "c.h", "pg_config.h.in", and "postgres_ext.h".  So at some
> point we at least thought about supporting inclusion of our header
> files from C++.  But I agree that if we're going to do it at all, we
> ought to do it everywhere.

We do support using the frontend headers (libpq, ecpg) from C++.  That's what 
postgres_ext.h is about.  The code in pg_config.h.in is autogenerated by 
Autoconf.  The stuff in c.h is probably still there from before we rearranged 
the header files so that the frontend includes don't use c.h.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Tom Lane
Grzegorz Jaskiewicz  writes:
> On 16 Jul 2009, at 14:53, Peter Eisentraut wrote:
 the standard does not promise that type _Bool has size = 1 byte.
 We have to have that because of on-disk compatibility requirements.

>>> I think the latter is easily fixable, or forceable to be one byte.

>> How do you plan to do that?

> by casting it to 1 byte type such as char ?

That's hardly going to improve readability for anyone.  Also, it will
flat out not work for the catalog struct declarations.  When we say
"bool relhasindex;" the compiler had better think that that's a
one-byte field.

> And it is pretty annoying, when your product also has its own BOOLean  
> defined...

IOW you're not using stdbool either?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] revised hstore patch

2009-07-16 Thread Andrew Gierth
Revision to previous hstore patch to fix (and add tests for) some edge
case bugs with nulls or empty arrays.

-- 
Andrew (irc:RhodiumToad)



hstore_20090716.patch.gz
Description: hstore patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] plpythonu datatype conversion improvements

2009-07-16 Thread Peter Eisentraut
On Wednesday 27 May 2009 02:07:33 Caleb Welton wrote:
> Patch for plpythonu

This patch doesn't apply; I think it got mangled during email transport.  
(Tabs changed to spaces, it looks like.)  Could you resend the patch as a 
separate attachment in a way that it doesn't get mangled?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mostly Harmless: c++bookends - patch 2 of 4

2009-07-16 Thread Robert Haas
On Mon, Jul 13, 2009 at 5:51 PM, Peter Eisentraut wrote:
> So I think either decoration is added to all of these files or none of them.
> And I think the former is not going to go over well.

We do have some things that are conditioned on __cplusplus already,
such as "c.h", "pg_config.h.in", and "postgres_ext.h".  So at some
point we at least thought about supporting inclusion of our header
files from C++.  But I agree that if we're going to do it at all, we
ought to do it everywhere.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Grzegorz Jaskiewicz


On 16 Jul 2009, at 14:53, Peter Eisentraut wrote:


On Thursday 16 July 2009 16:23:31 Grzegorz Jaskiewicz wrote:

On 16 Jul 2009, at 14:20, Tom Lane wrote:

Grzegorz Jaskiewicz  writes:

oh, another thing.
stdbool is C99 standard feature.


We are still targeting C89, not C99.

Another reason not to depend on stdbool is that, so far as I can  
see,

the standard does not promise that type _Bool has size = 1 byte.
We have to have that because of on-disk compatibility requirements.


I think the latter is easily fixable, or forceable to be one byte.


How do you plan to do that?

by casting it to 1 byte type such as char ?
I don't think anyone will add 3rd state to boolean in stdbool, at  
least not any time soon :)


And it is pretty annoying, when your product also has its own BOOLean  
defined...






Why C89, and not C99 ? Virtually all compilers for last 4 years have/
had C99 support.


Well, I think we want to run on systems that are older than 4 years,  
too.



Sure, but that's probably less than 1% of all systems.
The 4 years was a guess, I think its much more than that.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Peter Eisentraut
On Thursday 16 July 2009 16:23:31 Grzegorz Jaskiewicz wrote:
> On 16 Jul 2009, at 14:20, Tom Lane wrote:
> > Grzegorz Jaskiewicz  writes:
> >> oh, another thing.
> >> stdbool is C99 standard feature.
> >
> > We are still targeting C89, not C99.
> >
> > Another reason not to depend on stdbool is that, so far as I can see,
> > the standard does not promise that type _Bool has size = 1 byte.
> > We have to have that because of on-disk compatibility requirements.
>
> I think the latter is easily fixable, or forceable to be one byte.

How do you plan to do that?

> Why C89, and not C99 ? Virtually all compilers for last 4 years have/
> had C99 support.

Well, I think we want to run on systems that are older than 4 years, too.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Psql List Languages

2009-07-16 Thread Peter Eisentraut
On Thursday 16 July 2009 00:38:31 Fernando Ike de Oliveira wrote:
>   I applied the Tom Lane and Peter considerations, but I had that
> remove one column (Owner) of out command \dL to compatibility with 7.4
> version.

The mandate is to work as best as they can with older versions, not to provide 
only the feature set that works the same across old versions.  The correct 
behavior should be to show the owner column if the server version supports it.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Hitoshi Harada
2009/7/16 Pavel Stehule :
>> I'm also not sure how to handle this if the set has to be spooled to
>> disk. Quicksort and Quickselect do a lot of scans throught he data and
>> wouldn't perform well on disk.
>
> I thing, so problem is in aggregate func used as window func - or some
> missing optimalisation.
>
> when I replaced count(*) over () by subselect (SELECT count(*) FROM
> ...) then I got expected speed.
>

WindowAgg always spools its input in the buffer though (in your case)
it throws away row by row, so compared with pure aggregate it has
overhead. I think this is reasonable approach for large data situation
and different type of window. But yes, we must improve the current
model.

1) There should be some kind of lightweight approach for such
small-data/simple-window situations.

2) tuplestore_puttupleslot() seems to me heavy (copy, check, etc) even
if the data fits in the memory by triming rows. We want to have more
flexible temporary storage on the fly.


Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Tom Lane
Grzegorz Jaskiewicz  writes:
> Why C89, and not C99 ? Virtually all compilers for last 4 years have/ 
> had C99 support.

Not everybody is running a compiler released within the last 4 years.

The short answer is that C99 doesn't appear to offer enough advantage
over C89, *for our purposes*, to justify freezing out older systems.
stdbool is a perfect example of an addition that offers precisely
zero actual functional improvement.  All it would be for us is an
additional autoconf headache and portability hazard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Andres Freund
On Thursday 16 July 2009 15:18:01 Andres Freund wrote:
> On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
> > Andres Freund  writes:
> > > The queries on the second reporting schema unfortunately are different.
> > > Its the one were I copied the crazy example I attached in the original
> > > thread. With geqo=off a good part of the queries used daily use too
> > > much memory to plan sensibly and geqo=on outright fails with:
> > > "Error: Failed to make a valid plan"
> > > on some.
> >
> > We're not going to be able to fix this unless you show us examples.
>
> In the other thread I attached a similar to the real schema + example
> query. Not enough? And why?
For reference:
http://archives.postgresql.org/message-
id/200907091700.43411.and...@anarazel.de

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Grzegorz Jaskiewicz


On 16 Jul 2009, at 14:20, Tom Lane wrote:


Grzegorz Jaskiewicz  writes:

oh, another thing.
stdbool is C99 standard feature.


We are still targeting C89, not C99.

Another reason not to depend on stdbool is that, so far as I can see,
the standard does not promise that type _Bool has size = 1 byte.
We have to have that because of on-disk compatibility requirements.


I think the latter is easily fixable, or forceable to be one byte.
Why C89, and not C99 ? Virtually all compilers for last 4 years have/ 
had C99 support.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] boolean in C

2009-07-16 Thread Tom Lane
Grzegorz Jaskiewicz  writes:
> oh, another thing.
> stdbool is C99 standard feature.

We are still targeting C89, not C99.

Another reason not to depend on stdbool is that, so far as I can see,
the standard does not promise that type _Bool has size = 1 byte.
We have to have that because of on-disk compatibility requirements.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Andres Freund
On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
> Andres Freund  writes:
> > The queries on the second reporting schema unfortunately are different.
> > Its the one were I copied the crazy example I attached in the original
> > thread. With geqo=off a good part of the queries used daily use too much
> > memory to plan sensibly and geqo=on outright fails with:
> > "Error: Failed to make a valid plan"
> > on some.
> We're not going to be able to fix this unless you show us examples.
In the other thread I attached a similar to the real schema + example query. 
Not enough? And why?

> > Noticeable even some plans which were plannable in reasonable time before
> > now are problematic with enable_join_ordering=false!
> And this even more so --- it doesn't make any sense at all.
Why? With a high from_collapse_limit more subqueries get inlined - before 
inlining they get planned separately.

> > So, while I think the changes are principally a good idea, as
> > {from,join}_collapse_limit are a bit confusing options, I personally! do
> > not think geqo is ready for it today, especially as the benefit is
> > relatively small.
> In general I think that any such bugs are there anyway and need to be
> fixed anyway.
Understandable.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering

2009-07-16 Thread Tom Lane
Andres Freund  writes:
> The queries on the second reporting schema unfortunately are different. Its 
> the 
> one were I copied the crazy example I attached in the original thread.
> With geqo=off a good part of the queries used daily use too much memory to 
> plan 
> sensibly and geqo=on outright fails with: 
> "Error: Failed to make a valid plan"
> on some.

We're not going to be able to fix this unless you show us examples.

> Noticeable even some plans which were plannable in reasonable time before now 
> are problematic with enable_join_ordering=false!

And this even more so --- it doesn't make any sense at all.

> So, while I think the changes are principally a good idea, as 
> {from,join}_collapse_limit are a bit confusing options, I personally! do not 
> think geqo is ready for it today, especially as the benefit is relatively 
> small.

In general I think that any such bugs are there anyway and need to be
fixed anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow count in window query

2009-07-16 Thread Hitoshi Harada
2009/7/16 Greg Stark :
> On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehule 
> wrote:
>> postgres=# select avg(a) from (select a, row_number() over (order by
>> a) as r, count(*) over () as rc from x ) p where r in
>> ((rc+1)/2,(rc+2)/2) ;
>
> How does this compare to the plain non-windowing SQL implementation:
>
> select a from x order by a offset (select trunc(count(*)/2) from x) limit 1
>
> (except that that only works if count(*) is odd).
>
> Interestingly finding the median is actually O(n) using Quickselect.
> Maybe we should provide a C implementation of quickselect as a window
> function. I'm not sure how to wedge in the concept that the sort is
> unnecessary even though the ORDER BY is specified though.

median() should be aggregate, not window function, shouldn't it?

>
> I'm also not sure how to handle this if the set has to be spooled to
> disk. Quicksort and Quickselect do a lot of scans throught he data and
> wouldn't perform well on disk.

The WindowAgg spools rows into the tuplestore, which holds the data in
memory as far as it fits in. Do you have any idea how it stores
millons of millions of rows without tuplestore?

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_migrator not setting values of sequences?

2009-07-16 Thread Tom Lane
Peter Eisentraut  writes:
> Which leads me to a related question: Do you plan to maintain one
> version of pg_migrator that can upgrade any version to any other
> version (within reason), or will there be separate binaries, say
> pg_migrator-8.4 and pg_migrator-8.5, that each can only upgrade from
> $selfversion-1 to $selfversion?

I think we should plan on the latter, at least until we have a few
versions under our belts and can see what we might be letting ourselves
in for.  My guess is that n-1 to n will be plenty challenging enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >