Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?

2011-10-29 Thread Joshua D. Drake


On 10/29/2011 03:26 PM, Eric Ridge wrote:


Would y'all accept a patch that extended the "SELECT *" syntax to let
you list fields to exclude from the A_Star?

Quite regularly I'll be testing queries via psql and want to see all
the columns from a fairly wide table except maybe a giant text or xml
column.  A syntax like:

  SELECT * EXCLUDING (big_col1, big_col2) FROM foo;



If it is quite regular I would actually argue two things:

1. Use a view
2. You haven't normalized correctly

I am not trying to be a pedantic zealot or anything but those would be 
my arguments against.


Sincerely,

Joshua D. Drake
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
The PostgreSQL Conference - http://www.postgresqlconference.org/
@cmdpromptinc - @postgresconf - 509-416-6579

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?

2011-10-29 Thread Pavel Stehule
2011/10/30 Darren Duncan :
> I agree that this feature would be quite useful and should be included in
> SQL. The exact syntax is less of an issue, but just the ability to cleanly
> say "select all columns except for these".  I have in fact argued for the
> same feature in the past.
>
> If you want to and can implement this feature then more power to you.  I'll
> look forward to it being in Pg 9.2.
>
> I think then the only discussion point should be what (terse) syntax to use
> for it, not whether the feature should exist at all.
>
> Arguing against this feature is like arguing against supporting "where
> not()" or "except" or "not in".  One should be able to do complements not
> only of rows but of columns too.  Basic good language design.

My practice  speaks so this is not true - I don't know only bad
designed projects or very bad designed projects that needs too.

I don't see any reason why do it on SQL level.

It can sence only in psql as same special filter - if we would to
enhace a report features there.

Regards

Pavel Stehule

>
> -- Darren Duncan
>
> Eric Ridge wrote:
>>
>> Would y'all accept a patch that extended the "SELECT *" syntax to let
>> you list fields to exclude from the A_Star?
>>
>> Quite regularly I'll be testing queries via psql and want to see all
>> the columns from a fairly wide table except maybe a giant text or xml
>> column.  A syntax like:
>>
>>     SELECT * EXCLUDING (big_col1, big_col2) FROM foo;
>>
>> would be pretty handy.  It would definitely save some typing in
>> certain cases.  It seems like such a syntax would better document the
>> intent of a query too, rather than leaving one wondering if "big_col1"
>> was supposed to be omitted from the target list or not.
>>
>> Anyways, I just wanted to run the idea by youse guys before I put too
>> much more effort into it.  I've already made what appear to be the
>> minimum necessary changes to gram.y, and a few quick greps through the
>> code make me think the rest will be pretty easy.
>>
>> Maybe the SQL spec says something about this and nobody's done the work
>> yet?
>>
>> Thanks for your input!
>>
>> eric
>>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-29 Thread Shigeru Hanada
2011/10/30 Tom Lane :
> Hitoshi Harada  writes:
>> On Sat, Oct 29, 2011 at 8:13 AM, Tom Lane  wrote:
>>> I have not looked at the code, but ISTM the way that this has to work is
>>> that you set up a portal for each active scan.  Then you can fetch a few
>>> rows at a time from any one of them.
>
>> Hmm, true. Looking back at the original proposal (neither did I look
>> at the code,) there seems to be a cursor mode. ISTM it is hard for fdw
>> to know how the whole plan tree looks, so consequently do we always
>> cursor regardless of estimated row numbers?
>
> I think we have to.  Even if we estimate that a given scan will return
> only a few rows, what happens if we're wrong?  We don't want to blow out
> memory on the local server by retrieving gigabytes in one go.

Oh, I overlooked the possibility of wrong estimation.  Old PostgreSQL uses
1000 as default estimation, so big table which has not been analyzed may
crashes the backend.

To ensure the data retrieving safe, we need to get actual amount of result,
maybe by executing SELECT COUNT(*) in planning phase.  It sounds too heavy
to do for every scan, and it still lacks actual width.

One possible idea is to change default value of min_cursur_rows option to 0
so that pgsql_fdw uses CURSOR by default, but it seems not enough.  I'll
drop simple SELECT mode from first version of pgsql_fdw for safety.

>> I haven't had much experiences around cursor myself, but is it as
>> efficient as non-cursor?
>
> No, but if you need max efficiency you shouldn't be using foreign tables
> in the first place; they're always going to be expensive to access.
>
> It's likely that making use of native protocol portals (instead of
> executing a lot of FETCH commands) would help.  But I think we'd be well
> advised to do the first pass with just the existing libpq facilities,
> and then measure to see where to improve performance.

I long for protocol-level cursor. :)

-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-29 Thread Shigeru Hanada
2011/10/29 Hitoshi Harada :
> I have a doubt here, on sharing connection for each server. What if
> there are simultaneous scan on the same plan? Say,
>
> -> Nested Loop
>  -> Foreign Scan to table T1 on server A
>  -> Foreign Scan to table T2 on server A
>
> Okay, you are thinking about Foreign Join, so example above is too
> simple. But it is always possible to execute such a query if foreign
> scan nodes are separated far, isn't it? As far as I see from your
> explanation, scan T1 and scan T2 share the same connection. Now join
> node scans one row from left (T1) while asking rows from right (T2)
> without fetching all the rows from left. If T2 requests to server A,
> the connection's result (of T1) is discarded. Am I understand
> correctly?

I think that sharing a connection doesn't cause any problem.

In cursor mode, using multiple cursors concurrently through one connection
is OK.  In SELECT mode, pgsql_fdw executes SELECT statement with
PQexecParams and retrieves whole result *inside* the first Iterate call for
an outer tuple.  So libpq connection is already available when another scan
needs to call Iterate function.

-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Robert Haas
On Sat, Oct 29, 2011 at 4:36 PM, Tom Lane  wrote:
>> Well, it might not be strictly necessary for pg_utf8_increment() and
>> pg_eucjp_increment(), but it's clearly necessary for the generic
>> incrementer function for exactly the same reason it was needed in the
>> old coding.  I suppose we could weaken the rule to "you must leave a
>> valid character behind rather than a bunch of bytes that doesn't
>> encode to a character", but the cycle savings are negligible and the
>> current rule seems both simpler and more bullet-proof.
>
> No, it's *not* necessary any more, AFAICS.  make_greater_string discards
> the data and overwrites it with a null instantly upon getting a failure
> return from the incrementer.  The reason we used to need it was that we
> did pg_mbcliplen after failing to increment, but now we do that before
> we ever increment anything, so we already know the length of the last
> character.  It doesn't matter whether those bytes are still valid or
> contain garbage.

Oh, dude.  I think you are right.  I guess we can rip that crap out
then.  That's much cleaner.

>>> I'm also quite distressed that you ignored my advice to limit the number
>>> of combinations tried.  This patch could be horribly slow when dealing
>>> with wide characters, eg think what will happen when starting from
>>> U+1.
>
>> Uh, I think it will try at most one character in that position and
>> then truncate away that character entirely, per my last email on this
>> topic (to which you never responded):
>> http://archives.postgresql.org/pgsql-hackers/2011-09/msg01195.php
>
> Oh!  You are right, I was expecting it to try multiple characters at the
> same position before truncating the string.  This change seems to have
> lobotomized things rather thoroughly.  What is the rationale for that?
> As an example, when dealing with a single-character string, it will fail
> altogether if the next code value sorts out-of-order, so this seems to
> me to be a rather large step backwards.
>
> I think we ought to go back to the previous design of incrementing till
> failure and then truncating, which puts the onus on the incrementer to
> make a reasonable tradeoff of how many combinations to try per character
> position.  There's a simple tweak we could make to the patch to limit
> that: once we've maxed out a lower-order byte of a multibyte char,
> *don't reset it to minimum*, just move on to incrementing the next
> higher byte.  This preserves the old property that the maximum number of
> combinations tried is bounded by 256 * string's length in bytes.

On this point I believe you are still confused.  The old code tried
one character per position, and the new code tries one character per
position.  Nothing has been lobotomized in any way.  The difference is
that the old code used a "guess and check" approach to generate the
character, so there was an inner loop that was trying to generate a
character (possibly generating various garbage strings that did not
represent a character along the way) and then, upon success, checked
the sort order of that single string before truncating and retrying.
The new code does exactly the same thing in the outer loop - i.e.
truncate one character per iteration - but the inner loop has, at
least for UTF-8 and EUC-JP, been replaced with an algorithm that is
guaranteed to produce a valid character without needing to loop.

Now having said that, I think there is a possibility for some
improvement here.  If we know we're not going to spend a lot of time
uselessly screwing around trying to get something that will pass
pg_verifymbstr(), then we could probably afford to call ltproc several
times per position, rather than just once.  But that's not restoring
the behavior of the old algorithm; that's improving on the old
algorithm.  And before we do that, we need to think about a couple of
things: first, that silly looping behavior is still there for anything
other than UTF-8 and EUC-JP.  Until we have constant-time increment
functions for every encoding we support, we probably don't want to get
too jiggy with it; second, we need to convince ourselves that this
will succeed in a meaningful number of cases where the current
algorithm fails.  I played around a bit with the UTF-8 case (with
collation = en_US.UTF-8) before committing this and I suspect that
trying 4 or 5 characters per position could be a win - you might for
example be looking at something like an accented E, and you might have
several different versions of E in a row before you get to something
that's no longer E-like.  If you get to that point and still don't
find something that compares favorably, it's probably time to throw in
the towel.  Another idea would be to do the first increment by one,
and then increment by two, four, eight before giving up, or something
like that.  It probably needs some research, and frankly I'm happy to
leave it to someone who is having a real-world problem with it.  The
fact that we haven't gotten any complaints be

Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?

2011-10-29 Thread Darren Duncan
I agree that this feature would be quite useful and should be included in SQL. 
The exact syntax is less of an issue, but just the ability to cleanly say 
"select all columns except for these".  I have in fact argued for the same 
feature in the past.


If you want to and can implement this feature then more power to you.  I'll look 
forward to it being in Pg 9.2.


I think then the only discussion point should be what (terse) syntax to use for 
it, not whether the feature should exist at all.


Arguing against this feature is like arguing against supporting "where not()" or 
"except" or "not in".  One should be able to do complements not only of rows but 
of columns too.  Basic good language design.


-- Darren Duncan

Eric Ridge wrote:

Would y'all accept a patch that extended the "SELECT *" syntax to let
you list fields to exclude from the A_Star?

Quite regularly I'll be testing queries via psql and want to see all
the columns from a fairly wide table except maybe a giant text or xml
column.  A syntax like:

 SELECT * EXCLUDING (big_col1, big_col2) FROM foo;

would be pretty handy.  It would definitely save some typing in
certain cases.  It seems like such a syntax would better document the
intent of a query too, rather than leaving one wondering if "big_col1"
was supposed to be omitted from the target list or not.

Anyways, I just wanted to run the idea by youse guys before I put too
much more effort into it.  I've already made what appear to be the
minimum necessary changes to gram.y, and a few quick greps through the
code make me think the rest will be pretty easy.

Maybe the SQL spec says something about this and nobody's done the work yet?

Thanks for your input!

eric




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?

2011-10-29 Thread Eric Ridge
On Sat, Oct 29, 2011 at 6:35 PM, Stephen Frost  wrote:
>> Maybe the SQL spec says something about this and nobody's done the work yet?
>
> I don't know of anything like this in the spec.  Also, there would be
> concern about this possibly going against spec, breaking possibly valid
> queries, promoting keywords to reserved words, and maybe ending up in a
> bad situation if the SQL committee decides to support that kind of
> syntax for something completely different.

At least concerning breaking valid queries and promoting keywords, I
don't think the former can happen (they'd fail to parse today) and the
latter doesn't seem necessary as "EXCLUDING"'s use in this case
appears to be completely unambiguous.

However, I realize there's no second-guessing what the SQL committee
might do in the future.

> In general, I doubt this is something we'd implement, but others may
> feel differently.

I hope so.  :)

> What might be interesting to consider is how hard it
> would be to make psql smarter when it comes to line editing in this
> regard.  Maybe if there was a way to easily expand the '*' from psql and
> then you could remove the columns from the list easily..?

Probably really dang hard, especially when you consider a "SELECT *"
involving lots of joins.  And even if it turned out to be easy, it
would be limited to psql.

Anyways, it's just something I've wanted for quite awhile and thought
I'd actually do the work to make it happen, *if* y'all would take it.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?

2011-10-29 Thread Stephen Frost
Eric,

* Eric Ridge (eeb...@gmail.com) wrote:
> It seems like such a syntax would better document the
> intent of a query too, rather than leaving one wondering if "big_col1"
> was supposed to be omitted from the target list or not.

Well, I expect most here would say that any application query should be
specific about exactly what columns it wants (iow- don't use select * in
your apps :).  As for it being useful in psql, I could see that.

> Anyways, I just wanted to run the idea by youse guys before I put too
> much more effort into it.  I've already made what appear to be the
> minimum necessary changes to gram.y, and a few quick greps through the
> code make me think the rest will be pretty easy.
> 
> Maybe the SQL spec says something about this and nobody's done the work yet?

I don't know of anything like this in the spec.  Also, there would be
concern about this possibly going against spec, breaking possibly valid
queries, promoting keywords to reserved words, and maybe ending up in a
bad situation if the SQL committee decides to support that kind of
syntax for something completely different.

In general, I doubt this is something we'd implement, but others may
feel differently.  What might be interesting to consider is how hard it
would be to make psql smarter when it comes to line editing in this
regard.  Maybe if there was a way to easily expand the '*' from psql and
then you could remove the columns from the list easily..?

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?

2011-10-29 Thread Eric Ridge
Would y'all accept a patch that extended the "SELECT *" syntax to let
you list fields to exclude from the A_Star?

Quite regularly I'll be testing queries via psql and want to see all
the columns from a fairly wide table except maybe a giant text or xml
column.  A syntax like:

 SELECT * EXCLUDING (big_col1, big_col2) FROM foo;

would be pretty handy.  It would definitely save some typing in
certain cases.  It seems like such a syntax would better document the
intent of a query too, rather than leaving one wondering if "big_col1"
was supposed to be omitted from the target list or not.

Anyways, I just wanted to run the idea by youse guys before I put too
much more effort into it.  I've already made what appear to be the
minimum necessary changes to gram.y, and a few quick greps through the
code make me think the rest will be pretty easy.

Maybe the SQL spec says something about this and nobody's done the work yet?

Thanks for your input!

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange code in array_in

2011-10-29 Thread Jeff Davis
On Sat, 2011-10-29 at 15:13 -0400, Tom Lane wrote:
> What other lookups?

I just meant anything after that point in the function would surely fail
(get_type_io_data).

> array_out, and I believe a bunch of other places, use the same trick.

OK. In retrospect it is a very simple trick, but at the time I was
slightly confused by it. I guess I just haven't seen that idiom before.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-29 Thread Tom Lane
Hitoshi Harada  writes:
> On Sat, Oct 29, 2011 at 8:13 AM, Tom Lane  wrote:
>> I have not looked at the code, but ISTM the way that this has to work is
>> that you set up a portal for each active scan.  Then you can fetch a few
>> rows at a time from any one of them.

> Hmm, true. Looking back at the original proposal (neither did I look
> at the code,) there seems to be a cursor mode. ISTM it is hard for fdw
> to know how the whole plan tree looks, so consequently do we always
> cursor regardless of estimated row numbers?

I think we have to.  Even if we estimate that a given scan will return
only a few rows, what happens if we're wrong?  We don't want to blow out
memory on the local server by retrieving gigabytes in one go.

> I haven't had much experiences around cursor myself, but is it as
> efficient as non-cursor?

No, but if you need max efficiency you shouldn't be using foreign tables
in the first place; they're always going to be expensive to access.

It's likely that making use of native protocol portals (instead of
executing a lot of FETCH commands) would help.  But I think we'd be well
advised to do the first pass with just the existing libpq facilities,
and then measure to see where to improve performance.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-29 Thread Hitoshi Harada
On Sat, Oct 29, 2011 at 8:13 AM, Tom Lane  wrote:
> Hitoshi Harada  writes:
>> I have a doubt here, on sharing connection for each server. What if
>> there are simultaneous scan on the same plan? Say,
>
>> -> Nested Loop
>>   -> Foreign Scan to table T1 on server A
>>   -> Foreign Scan to table T2 on server A
>
>> Okay, you are thinking about Foreign Join, so example above is too
>> simple. But it is always possible to execute such a query if foreign
>> scan nodes are separated far, isn't it? As far as I see from your
>> explanation, scan T1 and scan T2 share the same connection. Now join
>> node scans one row from left (T1) while asking rows from right (T2)
>> without fetching all the rows from left. If T2 requests to server A,
>> the connection's result (of T1) is discarded. Am I understand
>> correctly?
>
> I have not looked at the code, but ISTM the way that this has to work is
> that you set up a portal for each active scan.  Then you can fetch a few
> rows at a time from any one of them.

Hmm, true. Looking back at the original proposal (neither did I look
at the code,) there seems to be a cursor mode. ISTM it is hard for fdw
to know how the whole plan tree looks, so consequently do we always
cursor regardless of estimated row numbers? I haven't had much
experiences around cursor myself, but is it as efficient as
non-cursor?

Regards,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Tom Lane
Robert Haas  writes:
> On Sat, Oct 29, 2011 at 3:35 PM, Tom Lane  wrote:
>> Ummm ... why do the incrementer functions think they need to restore the
>> previous value on failure?  AFAICS that's a waste of code and cycles,
>> since there is only one caller and it doesn't care in the least.

> Well, it might not be strictly necessary for pg_utf8_increment() and
> pg_eucjp_increment(), but it's clearly necessary for the generic
> incrementer function for exactly the same reason it was needed in the
> old coding.  I suppose we could weaken the rule to "you must leave a
> valid character behind rather than a bunch of bytes that doesn't
> encode to a character", but the cycle savings are negligible and the
> current rule seems both simpler and more bullet-proof.

No, it's *not* necessary any more, AFAICS.  make_greater_string discards
the data and overwrites it with a null instantly upon getting a failure
return from the incrementer.  The reason we used to need it was that we
did pg_mbcliplen after failing to increment, but now we do that before
we ever increment anything, so we already know the length of the last
character.  It doesn't matter whether those bytes are still valid or
contain garbage.

>> I'm also quite distressed that you ignored my advice to limit the number
>> of combinations tried.  This patch could be horribly slow when dealing
>> with wide characters, eg think what will happen when starting from
>> U+1.

> Uh, I think it will try at most one character in that position and
> then truncate away that character entirely, per my last email on this
> topic (to which you never responded):
> http://archives.postgresql.org/pgsql-hackers/2011-09/msg01195.php

Oh!  You are right, I was expecting it to try multiple characters at the
same position before truncating the string.  This change seems to have
lobotomized things rather thoroughly.  What is the rationale for that?
As an example, when dealing with a single-character string, it will fail
altogether if the next code value sorts out-of-order, so this seems to
me to be a rather large step backwards.

I think we ought to go back to the previous design of incrementing till
failure and then truncating, which puts the onus on the incrementer to
make a reasonable tradeoff of how many combinations to try per character
position.  There's a simple tweak we could make to the patch to limit
that: once we've maxed out a lower-order byte of a multibyte char,
*don't reset it to minimum*, just move on to incrementing the next
higher byte.  This preserves the old property that the maximum number of
combinations tried is bounded by 256 * string's length in bytes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add socket dir to pg_config..?

2011-10-29 Thread Mr. Aaron W. Swenson
On Fri, Oct 28, 2011 at 06:33:39PM +0200, Dimitri Fontaine wrote:
> Andrew Dunstan  writes:
> > Er, which distros other than debian/ubuntu?
> 
> Well, any and all derivatives I guess, to begin with.
> 
>   http://distrowatch.com/dwres.php?resource=independence#debian
>   Based on Debian GNU/Linux: 129 Distributions
> 
> More seriously, I'm not sure how to understand why some people will both
> frown upon distribution allowing themselves to patch the version of
> PostgreSQL they are packaging, and vote against making their life
> easier.
> 
> If /tmp is the only decent place where to put the socket file on Unix
> when security and other concerns are considered, then sure, making
> distro life difficult is a good thing to do. But then let's take it to
> the FHS that debian and ubuntu are implementing, AFAIUI.
> 
> I'm puzzled, maybe I'm not understanding a key point here though.
> 
> Regards,
> -- 
> Dimitri Fontaine
> http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

In Gentoo, we change the socket directory to /var/run/postgresql via
pg_config_manual.h. However, I'm not too terribly interested in pg_config
outputting the directory location.

We inform users at the end of every install where the default location
is. Further, all of the packages we maintain build against the sources so
the packages automatically know where the socket directory is located.

-- 
Mr. Aaron W. Swenson
Gentoo Linux Developer
Email: titanof...@gentoo.org
GnuPG FP : 2C00 7719 4F85 FB07 A49C  0E31 5713 AA03 D1BB FDA0
GnuPG ID : D1BBFDA0


pgph6Z3VCbZWh.pgp
Description: PGP signature


Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Robert Haas
On Sat, Oct 29, 2011 at 3:35 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I've committed this, after a good deal of hacking on the comments,
>> some coding style cleanup, and one bug fix:
>
> Ummm ... why do the incrementer functions think they need to restore the
> previous value on failure?  AFAICS that's a waste of code and cycles,
> since there is only one caller and it doesn't care in the least.

Well, it might not be strictly necessary for pg_utf8_increment() and
pg_eucjp_increment(), but it's clearly necessary for the generic
incrementer function for exactly the same reason it was needed in the
old coding.  I suppose we could weaken the rule to "you must leave a
valid character behind rather than a bunch of bytes that doesn't
encode to a character", but the cycle savings are negligible and the
current rule seems both simpler and more bullet-proof.

> I'm also quite distressed that you ignored my advice to limit the number
> of combinations tried.  This patch could be horribly slow when dealing
> with wide characters, eg think what will happen when starting from
> U+1.

Uh, I think it will try at most one character in that position and
then truncate away that character entirely, per my last email on this
topic (to which you never responded):

http://archives.postgresql.org/pgsql-hackers/2011-09/msg01195.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-29 Thread Robert Haas
On Fri, Oct 28, 2011 at 9:22 PM, Bruce Momjian  wrote:
> OK, the attached, applied patch removes the pg_upgrade dependency on the
> 'postgres' database existing in the new cluster.  However, vacuumdb,
> used by pg_upgrade, still has this dependency:
>
>            conn = connectDatabase("postgres", host, port, username,
>                prompt_password, progname);
>
> In fact, all the /scripts binaries use the postgres database, except for
> createdb/dropdb, which has this heuristic:
>
>    /*
>     * Connect to the 'postgres' database by default, except have the
>     * 'postgres' user use 'template1' so he can create the 'postgres'
>     * database.
>     */
>    conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : 
> "postgres",
>                           host, port, username, prompt_password, progname);
>
> This makes sense because you might be creating or dropping the postgres
> database.  Do we want these to have smarter database selection code?

Well, I suppose as long as we're cleaning this up, we might as well be
thorough, so, sure, why not?  I think the algorithm pg_dumpall uses is
pretty sensible: let the user specify the database to use if they so
desire; if not, try postgres first and then template1.  I think we
could stick some logic for that in common.c which could be shared by
clusterdb, createdb, dropdb, dropuser, reindexdb, and vacuumdb.

However, we need to rethink the flag to be used for this: pg_dumpall
uses -l, but many of the other utilities already use that for some
other purpose, and it's not exactly mnemonic anyway.  "-d" for
database could work, but that's also in use in some places, and
furthermore somewhat confusing since many if not all of these
utilities have an option to operate on a single database only, and you
might think that -d would specify the database to operate on, rather
than the one to be used to get the list of databases.  pgAdmin uses
the term "maintenance database" to refer to a database to be used when
none is explicitly specified, and I think that's fairly clear
terminology.  So I propose that we add a --maintenance-db option (with
no short form, since this is a relatively obscure need) to the tools
listed above.  The tools will pass the associated value (or NULL if
the option is not specified) to the above-mentioned routine in
common.c, which will do the rest.

If nobody objects, I'll go do that.  Hopefully that should be enough
to put this problem to bed more or less permanently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Tom Lane
Robert Haas  writes:
> I've committed this, after a good deal of hacking on the comments,
> some coding style cleanup, and one bug fix:

Ummm ... why do the incrementer functions think they need to restore the
previous value on failure?  AFAICS that's a waste of code and cycles,
since there is only one caller and it doesn't care in the least.

I'm also quite distressed that you ignored my advice to limit the number
of combinations tried.  This patch could be horribly slow when dealing
with wide characters, eg think what will happen when starting from
U+1.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange code in array_in

2011-10-29 Thread Tom Lane
Jeff Davis  writes:
> In array_in(), I see the following code:
>   my_extra->element_type = ~element_type;

> It seems like it was explicitly changed from InvalidOid to
> ~element_type. At first I thought it was a mistake, but then I thought
> maybe it was to ensure that the next branch was taken even if
> element_type == InvalidOid.

Exactly.

> But the rest of the lookups will surely fail
> on InvalidOid, and it seems like it would be easy to test for InvalidOid
> at the beginning if we wanted to fail.

What other lookups?  We do need to protect ourselves against
element_type being InvalidOid, and the problem is that if we just set
my_extra->element_type to InvalidOid and press on, the lookup won't be
made and we'll do something bizarre (possibly even crash) using bogus
cached information.  On the other hand, element_type *shouldn't* be
InvalidOid here, so adding an explicit test for that seemed like a waste
of code.  It's sufficient to let get_type_io_data complain about the
case.

array_out, and I believe a bunch of other places, use the same trick.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS]

2011-10-29 Thread Erik Rijkers
Would it be possible to remove of the double quotes in the daterange display of 
BC dates?

select '[0001-10-29 BC,2011-10-29)'::daterange;
  daterange
--
 ["0001-10-29 BC",2011-10-29)
(1 row)


after all, it's also:

select '0001-10-29 BC'::date;
 date
---
 0001-10-29 BC
(1 row)

without double quotes.  It's not important, perhaps; it just looks ugly to me.

Do they serve a purpose?


Erik Rijkers







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Robert Haas
On Sat, Oct 29, 2011 at 1:16 PM,   wrote:
> Hello, I feel at a loss what to do...
>
>> I thought that code was looking for 0xED/0xF4 in the second position,
>> but it's actually looking for them in the first position, which makes
>> vastly more sense.  Whee!
>
> Anyway, I try to describe another aspect of this code a the present.

I've committed this, after a good deal of hacking on the comments,
some coding style cleanup, and one bug fix:

+   workstr[len] = '\0';

Without that, when we start fiddling with the second-to-last
character, the last one is still hanging around, which is different
than the old behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread horiguchi . kyotaro
Hello, I feel at a loss what to do...

> I thought that code was looking for 0xED/0xF4 in the second position,
> but it's actually looking for them in the first position, which makes
> vastly more sense.  Whee!

Anyway, I try to describe another aspect of this code a the present.

The switch block in the g_utf8_increnet is a folded code of five
individual manipulation according to the byte-length of the
sequence. The separation presupposes the input bytes and length
formes a valid utf-8 sequence.

For a character more than 5 byte length, retunes false.

For 4 bytes, the sequence ranges between U+1 and U+1f.

  If charptr[3] is less than 0xbf, increment it and return true.

  Else assign 0x80 to charptr[3] and then if charptr[2] is less
  than 0xbf increment it and return true.

  Else assign 0x80 to charptr[2] and then,
if (charptr[1] is less than 0x8f when charptr[0] == 0xf4) or
   (charptr[1] is less than 0xbf when charptr[0] != 0xf4)
  increment it and return true.

  Else assign 0x80 to charptr[1] and then if charptr[0] is not
  0xf4 increment it and return true.

  Else the input sequence must be 0xf4 0x8f 0xbf 0xbf which
  represents U+10 and this is the upper limit of UTF-8
  representation. Restore the sequnce and return false.


for 3 bytes, the sequence ranges between u+800 and u+.

  If charptr[2] is less than 0xbf increment it and reutrn true.

  Else assign 0x80 to charptr[2] and then,
if (charptr[1] is less than 0x9f when charptr[0] == 0xed) or
   (charptr[1] is less than 0xbf when charptr[0] != 0xed) 
  increment it and return true.

The sequence 0xed 0x9f 0xbf represents U+d7ff will
incremented to 0xef 0x80 0x80 (U+f000) at the end.

  Else assign 0x80 to charptr[1] and then if charptr[0] is not
  0xef increment it and return true.

  Else the input sequence must be 0xef 0xbf 0xbf which represents
  U+ and the next UTF8 sequence has the length of 4. Restore
  the sequnce and return false.


For 2 bytes, the sequence ranges between U+80 and U+7ff.

  If charptr[1] is less than 0xbf increment it and reutrn true.

  Else assign 0x80 to charptr[1] and then if charptr[0] is not
  0xdf increment it and return true.

  Else the input sequence must be 0xdf 0xbf which reporesents
  U+7ff and next UTF8 sequence has the length of 3.  Restore the
  sequence and return false.


For 1 byte, the byte ranges between U+0 and U+7f.

  If charptr[0] is less than 0x7f increment it and return true.

  Else the input sequence must be 0x7f which represents U+7f and
  next UTF8 sequence has the length of 2. Restore the sequence
  and return false.

-- 
Kyotaro Horiguchi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] strange code in array_in

2011-10-29 Thread Jeff Davis
In array_in(), I see the following code:

  my_extra->element_type = ~element_type;

It seems like it was explicitly changed from InvalidOid to
~element_type. At first I thought it was a mistake, but then I thought
maybe it was to ensure that the next branch was taken even if
element_type == InvalidOid. But the rest of the lookups will surely fail
on InvalidOid, and it seems like it would be easy to test for InvalidOid
at the beginning if we wanted to fail.

Can someone please explain, and perhaps include a comment indicating
what's going on? Or is it just too early and I missed something simple?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-29 Thread Tom Lane
Hitoshi Harada  writes:
> I have a doubt here, on sharing connection for each server. What if
> there are simultaneous scan on the same plan? Say,

> -> Nested Loop
>   -> Foreign Scan to table T1 on server A
>   -> Foreign Scan to table T2 on server A

> Okay, you are thinking about Foreign Join, so example above is too
> simple. But it is always possible to execute such a query if foreign
> scan nodes are separated far, isn't it? As far as I see from your
> explanation, scan T1 and scan T2 share the same connection. Now join
> node scans one row from left (T1) while asking rows from right (T2)
> without fetching all the rows from left. If T2 requests to server A,
> the connection's result (of T1) is discarded. Am I understand
> correctly?

I have not looked at the code, but ISTM the way that this has to work is
that you set up a portal for each active scan.  Then you can fetch a few
rows at a time from any one of them.

If you're doing this through libpq, it'd be necessary to implement each
scan using a cursor.  I'm not sure whether it'd be worth our time to
add more functions to libpq to allow more-direct access to the protocol
portal feature.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So where are we on the open commitfest?

2011-10-29 Thread Greg Stark
On Fri, Oct 28, 2011 at 8:50 PM, Tom Lane  wrote:
> * Non-inheritable check constraints
>
> Greg Stark claimed this one for committing a few weeks ago, but has
> not done anything visible since then.  Greg?
>

Sorry, I had hoped to look at it during pgconfeu but found Amsterdam a
bit too distracting.

I'm looking at it now.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add socket dir to pg_config..?

2011-10-29 Thread Greg Stark
On Fri, Oct 28, 2011 at 5:33 PM, Dimitri Fontaine
 wrote:
> I'm puzzled, maybe I'm not understanding a key point here though.
>

I think the only thing you're missing here is that, despite
appearances some days, Postgres is not run by a single hive mind. Tom
thinks there needs to be a single location for all builds as it's part
of the protocol, and others agree with the approach Debian took and we
haven't really resolved that
discrepancy. So Postgres's treatment of this configuration is
naturally a bit schizophrenic.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-29 Thread Martijn van Oosterhout
On Sat, Oct 29, 2011 at 12:25:46AM -0700, Hitoshi Harada wrote:
> I have a doubt here, on sharing connection for each server. What if
> there are simultaneous scan on the same plan? Say,
> 
> -> Nested Loop
>   -> Foreign Scan to table T1 on server A
>   -> Foreign Scan to table T2 on server A
> 
> Okay, you are thinking about Foreign Join, so example above is too
> simple. But it is always possible to execute such a query if foreign
> scan nodes are separated far, isn't it? As far as I see from your
> explanation, scan T1 and scan T2 share the same connection. Now join
> node scans one row from left (T1) while asking rows from right (T2)
> without fetching all the rows from left. If T2 requests to server A,
> the connection's result (of T1) is discarded. Am I understand
> correctly?

This would need to be factored in in the cost calculations. For remote
servers there is an overhead per tuple transmitted.  So in the above
case it might actually be quicker to do the nested loop locally.

To handle the parallel case you might need to materialise in the inner
loop, that would avoid the double scan. Or we could fix the protocol so
you can stream multiple queries at once.

Actually, you can already do this is you use DECLARE CURSOR for all the
queries upfront and then FETCH as needed.  That way you can do it all
over one connection.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-10-29 Thread Simon Riggs
On Fri, Sep 9, 2011 at 10:56 AM, Fujii Masao  wrote:

> In previous discussion, we've reached the consensus that we should unite
> recovery.conf and postgresql.conf. The attached patch does that. The
> patch is WIP, I'll have to update the document, but if you notice something,
> please feel free to comment.

My short summary of the thread is

Fujii proposes we allow parameters currently in recovery.conf to be
specified in postgresql.conf. All agree to that. Fujii suggests that
if we have both postgresql.conf and recovery.conf then recovery.conf
should contain overrides.

Fujii then suggests that if such an override exists, then SHOW would
not work properly. Magnus is rightly horrified and many speak against
allowing recovery.conf to continue to exist for this reason.  I note
that if recovery.conf is an include file of postgresql.conf then the
overrides would work correctly, just as if postgresql.conf had
multiple settings for that parameter. So the premise is incorrect, so
the conclusion is not relevant.

Simon, JD, Greg Stark speak in favour of the usefulness of having a
recovery.conf separate from postgresql.conf. Tatsuo confirms pgpool
uses this.

Simon, Fujii, Peter agree an automatic include of recovery.conf would be useful

Robert points out that pg_ctl promote was a good feature

Simon, JD say that backwards compatibility is important

Everybody agrees a neater way of invoking standby mode would be good.

Peter points out that including recovery target parameters in
postgresql.conf would be difficult and require manual editing, and
also that pg_ctl -o is not a suitable interface.

The thread also includes a variety of other alternate ideas,
misunderstandings and other commentary.

- - -

My thoughts on how to resolve this are...

Everybody agrees these two points:

* Allow recovery parameters to be handled same way as other GUCs, and
specified in postgresql.conf if desired.

* Allow parameters to be reloaded at SIGHUP and visible using SHOW.

Those two things do not themselves force us to break backwards compatibility.

We also agree that we want a neater way to startup in standby mode.

In 9.1 we added "pg_ctl promote" as a better way of indicating
failover/switchover. When we did that we kept the trigger_file
parameter added in 9.0, which shows it is possible to add a new API
without breaking backwards compatibility.

We should add a "pg_ctl standby" command as a better way of indicating
starting up (also described as triggering) standby mode. We keep
standby_mode parameter.  There is no difference here between file
based and stream based replication: you can have file, stream or both
file and stream (as intended).
In this mode the recovery target parameters are *ignored* even if
specified (explained below).
http://developer.postgresql.org/pgdocs/postgres/recovery-target-settings.html

In 9.2 the presence of recovery.conf can and therefore should continue
to act as it does in 9.1. This should be automatically included at the
end of postgresql.conf, which naturally and with no additional code
allows us to override settings, with overrides visible by SHOW. We
don't make any specific checks to see if someone has added a
postgresql.conf parameter in there. If there is a recovery target
parameter in recovery.conf we enter recovery, otherwise we operate as
a standby. recovery.conf is no longer *required* for standby modes.

These things are announced as deprecated and will be removed when we
go to release 10.0
* trigger_file
* standby_mode
* recovery.conf indicates standby

recovery.conf should continue to be required to perform a PITR. If we
place the recovery_target parameters into postgresql.conf we will have
no way to differentiate between (1) a recovery that has successfully
completed then crashed and (2) a user-specified recovery, which was
the original rationale for its use. This is OK, since we now encourage
people to enter a recovery by creating recovery.conf and for entering
a standby to use a new cleaner API without the confusing use of the
word "recovery".

I think that meets all requirements, as far as technically possible.

Best Regards

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So where are we on the open commitfest?

2011-10-29 Thread Simon Riggs
On Fri, Oct 28, 2011 at 9:50 PM, Simon Riggs  wrote:

>> * unite recovery.conf and postgresql.conf
>>
>> This one also seems to be lacking consensus more than anything else.
>> What do we do about that?
>
> I'll re-read the thread in detail to see if I can break impasse.

I've added a new comment onto the OP to summarise this.

ISTM there is a good way forwards.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So where are we on the open commitfest?

2011-10-29 Thread Simon Riggs
On Sat, Oct 29, 2011 at 2:21 AM, Fujii Masao  wrote:
> On Sat, Oct 29, 2011 at 5:50 AM, Simon Riggs  wrote:
>> On Fri, Oct 28, 2011 at 8:50 PM, Tom Lane  wrote:
>>
>>> * Separating bgwriter and checkpointer
>>>
>>> Same for this one.
>>
>> Will commit by end of Monday
>
> There are plenty of source comments (and probably documents) describing that
> checkpoint is performed by bgwriter, but the patch that you posted
> didn't correct
> them. Are you going to include the change of them in the patch? Or commit
> separately?

Yes, I will resolve all your comments either in first commit, or ones
immediately following. Thank you again for making those observations.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So where are we on the open commitfest?

2011-10-29 Thread Heikki Linnakangas

On 29.10.2011 06:40, Tom Lane wrote:

Robert Haas  writes:

On Fri, Oct 28, 2011 at 3:50 PM, Tom Lane  wrote:

* Range Types

This has certainly had plenty of work done too.  If it's not committable
yet, I think we should mark it Returned With Feedback for now.



I have been thinking about looking at committing at least part of
this, but thought Heikki might be planning to pick it up.


Yeah, this one is in Heikki's court AFAIK.


I'm waiting for Jeff to fix the caching of type metadata in range type 
functions to work like the corresponding caching in array functions. I 
believe that's the last outstanding item on that patch. Jeff, can you 
get around to do that in the next few days? If not, I might go and do it 
myself, but I'm lazy and would prefer not to ;-).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-29 Thread Hitoshi Harada
2011/10/25 Shigeru Hanada :
>
> Connection management
> =
> The pgsql_fdw establishes a new connection when a foreign server is
> accessed first for the local session.  Established connection is shared
> between all foreign scans in the local query, and shared between even
> scans in following queries.  Connections are discarded when the current
> transaction aborts so that unexpected failure won't cause connection
> leak.  This is implemented with resource owner mechanism.
>

I have a doubt here, on sharing connection for each server. What if
there are simultaneous scan on the same plan? Say,

-> Nested Loop
  -> Foreign Scan to table T1 on server A
  -> Foreign Scan to table T2 on server A

Okay, you are thinking about Foreign Join, so example above is too
simple. But it is always possible to execute such a query if foreign
scan nodes are separated far, isn't it? As far as I see from your
explanation, scan T1 and scan T2 share the same connection. Now join
node scans one row from left (T1) while asking rows from right (T2)
without fetching all the rows from left. If T2 requests to server A,
the connection's result (of T1) is discarded. Am I understand
correctly?

Regards,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers