Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?

2010-11-21 Thread Leonardo Francalanci
 Here's my single biggest tip for newcomers to the Postgres  source:
 if you don't use ctags, glimpse, or some other tool that can  quickly
 show you all references to a given identifier, go out and get  one.
 It's one of the easiest ways to learn about  things.


I guess Eclipse is the best tool for the newcomer as it gives
you all the references very easily and shows the functions/structs
declaration when you pass with the mouse over one...

http://wiki.postgresql.org/wiki/Working_with_Eclipse




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: patch: format function - fixed oid

2010-11-21 Thread Robert Haas
On Nov 21, 2010, at 1:09 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 
 OK, I've committed this, after a fairly heavy rewrite.
 
 thank you very much

Ah, nuts. I forgot to bump catversion.

...Robert

Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-21 Thread Robert Haas
On Sat, Nov 20, 2010 at 4:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 So what DO we need to guard against here?

 I think the general problem can be stated as process A changes two or
 more values in shared memory in a fairly short span of time, and process
 B, which is concurrently examining the same variables, sees those
 changes occur in a different order than A thought it made them in.

 In practice we do not need to worry about changes made with a kernel
 call in between, as any sort of context swap will cause the kernel to
 force cache synchronization.

 Also, the intention is that the locking primitives will take care of
 this for any shared structures that are protected by a lock.  (There
 were some comments upthread suggesting maybe our lock code is not
 bulletproof; but if so that's something to fix in the lock code, not
 a logic error in code using the locks.)

 So what this boils down to is being an issue for shared data structures
 that we access without using locks.  As, for example, the latch
 structures.

So is the problem case a race involving owning/disowning a latch vs.
setting that same latch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER OBJECT any_name SET SCHEMA name

2010-11-21 Thread Robert Haas
On Sat, Nov 20, 2010 at 11:23 PM, Robert Haas robertmh...@gmail.com wrote:
 Ah, nuts.  I see now there's a v7.  Never mind...

OK.  I looked at the right version, now.  Hopefully.

It seems we have no regression tests at all for any of the existing
SET SCHEMA commands.  This seems like a good time to correct that
oversight, and also add some for the new commands you're adding here.
(It might be helpful to submit the regression tests for the existing
commands as a separate patch.) Also, you're missing psql tab
completion support, which would be nice to have.

In CheckSetNamespace() you have the message 'already exists in schema'
there where the existing, similar checks say 'is already in schema',
which is a bit confusing.  But that code looks useful, and in fact I
think we should use it for the existing object types also to avoid
code duplication.  This is technically a regression in terms of
translatability, since instead of a single string that says something
like 'function %s is already in schema %s', you'll have '%s is already
in schema %s', and where the first %s is provided by
getObjectDescription().  But that doesn't seem like a problem, because
(1) we're already doing it that way for dependency error messages
anyway and (2) as far as I can tell from a visual scan and some
hacking with Google Translate, all of the languages for which we have
backend translations put the object type next to the object name
anyway.

So, attached is a proposed patch that just adds CheckSetNamespace()
and makes the existing SET SCHEMA commands use it.  Barring
objections, I'll go ahead and commit this part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


check_set_namespace.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Greg Smith

Jeff Janes wrote:

And for very large memory
systems, even 1% may be too much to cache (dirty*_ratio can only be
set in integer percent points), so recent kernels introduced
dirty*_bytes parameters.  I like these better because they do what
they say.  With the dirty*_ratio, I could never figure out what it was
a ratio of, and the results were unpredictable without extensive
experimentation.
  


Right, you can't set dirty_background_ratio low enough to make this 
problem go away.  Even attempts to set it to 1%, back when that that was 
the right size for it, seem to be defeated by other mechanisms within 
the kernel.  Last time I looked at the related source code, it seemed 
the congestion control logic that kicks in to throttle writes was a 
likely suspect.  This is why I'm not real optimistic about newer 
mechanism like the dirty_background_bytes added 2.6.29 to help here, as 
that just gives a mapping to setting lower values; the same basic logic 
is under the hood.


Like Jeff, I've never seen dirty_expire_centisecs help at all, possibly 
due to the same congestion mechanism. 


Yes, but how much work do we want to put into redoing the checkpoint
logic so that the sysadmin on a particular OS and configuration and FS
can avoid having to change the kernel parameters away from their
defaults?  (Assuming of course I am correctly understanding the
problem, always a dangerous assumption.)
  


I've been trying to make this problem go away using just the kernel 
tunables available since 2006.  I adjusted them carefully on the server 
that ran into this problem so badly that it motivated the submitted 
patch, months before this issue got bad.  It didn't help.  Maybe if they 
were running a later kernel that supported dirty_background_bytes that 
would have worked better.  During the last few years, the only thing 
that has consistently helped in every case is the checkpoint spreading 
logic that went into 8.3.  I no longer expect that the kernel developers 
will ever make this problem go away the way checkpoints are written out 
right now, whereas the last good PostgreSQL work in this area definitely 
helped.


The basic premise of the current checkpoint code is that if you write 
all of the buffers out early enough, by the time syncs execute enough of 
the data should have gone out that those don't take very long to 
process.  That was usually true for the last few years, on systems with 
a battery-backed cache; the amount of memory cached by the OS was 
relatively small relative to the RAID cache size.  That's not the case 
anymore, and that divergence is growing bigger.


The idea that the checkpoint sync code can run in a relatively tight 
loop, without stopping to do the normal background writer cleanup work, 
is also busted by that observation.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-11-21 Thread David E. Wheeler
On Nov 20, 2010, at 9:31 PM, Terry Laurenzo wrote:

 Assuming that the JSON datatype (at a minimum) normalizes text for storage, 
 then the text storage option accounts for about the most expensive path but 
 with none of the benefits of an internal binary form (smaller size, ability 
 to cheaply perform non-trivial manipulation within the database server).
 
 Of course, just having a JSON datatype that blindly stores text will beat 
 everything, but I'm getting closer to thinking that the binary option is 
 worth the tradeoff.
 
 Comments?

benchmarks++

Nice to have some data points for this discussion.

Best,

David, still hoping for the JSON data type in 9.1…
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER OBJECT any_name SET SCHEMA name

2010-11-21 Thread David Fetter
On Sun, Nov 21, 2010 at 07:53:57AM -0500, Robert Haas wrote:
 On Sat, Nov 20, 2010 at 11:23 PM, Robert Haas robertmh...@gmail.com wrote:
  Ah, nuts.  I see now there's a v7.  Never mind...
 
 OK.  I looked at the right version, now.  Hopefully.
 
 It seems we have no regression tests at all for any of the existing
 SET SCHEMA commands.  This seems like a good time to correct that
 oversight, and also add some for the new commands you're adding here.
 (It might be helpful to submit the regression tests for the existing
 commands as a separate patch.) Also, you're missing psql tab
 completion support, which would be nice to have.
 
 In CheckSetNamespace() you have the message 'already exists in schema'
 there where the existing, similar checks say 'is already in schema',
 which is a bit confusing.  But that code looks useful, and in fact I
 think we should use it for the existing object types also to avoid
 code duplication.

Should this really error out?  It's just a NOOP, so perhaps a NOTICE
would be more appropriate.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-11-21 Thread David Fetter
On Fri, Oct 29, 2010 at 08:33:00AM -0700, David Fetter wrote:
 On Tue, Oct 26, 2010 at 11:55:07AM +0900, Itagaki Takahiro wrote:
  On Tue, Oct 26, 2010 at 11:34 AM, David Fetter da...@fetter.org wrote:
   Do we need to 'add' it?
   Possibly.  My understanding is that it couldn't really replace it.
  
  Ah, I see.  I was wrong.  We can have modification privileges for
  views even if they have no INSTEAD OF triggers.
 
 Right.
 
  So, I think your original patch is the best solution.  We could use
  has_table_privilege() additionally, but we need to consider any
  other places if we use it.  For example, DROP privileges, etc.
 
 That seems like a matter for a separate patch.  Looking this over, I
 found I'd created a query that can never get used, so please find
 enclosed the next version of the patch :)

Could someone please commit this? :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER OBJECT any_name SET SCHEMA name

2010-11-21 Thread Robert Haas
On Nov 21, 2010, at 1:03 PM, David Fetter da...@fetter.org wrote:
 Should this really error out?  It's just a NOOP, so perhaps a NOTICE
 would be more appropriate.

Perhaps, but the purpose of this patch is to streamline the code, not change 
the behavior.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER OBJECT any_name SET SCHEMA name

2010-11-21 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 So, attached is a proposed patch that just adds CheckSetNamespace()
 and makes the existing SET SCHEMA commands use it.  Barring
 objections, I'll go ahead and commit this part.

Thank you for applying the new function to the existing code paths, that
was needed as soon as the new function would get acceptance! :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 12:31 AM, Terry Laurenzo t...@laurenzo.org wrote:
 What we see from this is that parsing JSON text and generating a binary
 representation is cheap, representing approximately 10% of the base case
 time.  Conversely, anything that involves generating JSON text is expensive,
 accounting for 30-40% of the base case time.  Some incidental profiling
 shows that while the entire operation is expensive, the process of
 generating string literals dominates this time.  There is likely room for
 optimization in this method, but it should be noted that most of these
 documents are lightly escaped (if escaped at all) which represents the happy
 path through the string literal output function.

Ouch!  That's kind of painful.  But certainly for some use cases it
will work out to a huge speedup, if you're doing subscripting or
similar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving prep_buildtree used in VPATH builds

2010-11-21 Thread Robert Haas
On Fri, Nov 19, 2010 at 7:50 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Greg Smith's message of vie nov 19 01:52:34 -0300 2010:

 I'd think that if configure takes
 longer than it has to because the system is heavily loaded, the amount
 compilation time is going to suffer from that would always dwarf this
 component of total build time.  But if this was slow enough at some
 point to motivate you to write a patch for it, maybe that assumption is
 wrong.

 What if instead of -depth you do something like
 find the_args | sort -r
 ?  If you find a way to filter out the parents that you know have
 already been created, you could also cut down on the number of mkdir -p
 calls, which could result in a larger speedup.  And maybe we should
 remove the test -d.  Also, the `expr` call could be substituted by
 ${item##$sourcedir}, which is supposed to be a POSIX shell feature
 according to
 http://www.unix.org/whitepapers/shdiffs.html and
 http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html

 In short, there are plenty of optimization opportunities for this script
 without having to involve nonstandard constructs.

It seems that we have a general consensus that, aside from any
portability concerns (which so far seem to be mostly theoretical),
there's little to no evidence that it is a consistent win from a
performance standpoint.  Alvaro wasn't able to demonstrate a win at
all, Tom theorized - albeit without evidence - that it might be a loss
under some circumstances, and Gurjeet (the OP) could only reproduce
about a ~4% speedup, amounting to 500 ms (although he did see an ~11%
speedup, amounting to 5 s, on one occasion).  So I agree with Greg
Smith's comments a couple of days ago - it seems like this may not be
worth worrying about.  I'm going to mark this Returned with Feedback
for now, though of course it can come back to life if more evidence
that this is the right thing to do comes to life.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 1:07 PM, David Fetter da...@fetter.org wrote:
 On Fri, Oct 29, 2010 at 08:33:00AM -0700, David Fetter wrote:
 On Tue, Oct 26, 2010 at 11:55:07AM +0900, Itagaki Takahiro wrote:
  On Tue, Oct 26, 2010 at 11:34 AM, David Fetter da...@fetter.org wrote:
   Do we need to 'add' it?
   Possibly.  My understanding is that it couldn't really replace it.
 
  Ah, I see.  I was wrong.  We can have modification privileges for
  views even if they have no INSTEAD OF triggers.

 Right.

  So, I think your original patch is the best solution.  We could use
  has_table_privilege() additionally, but we need to consider any
  other places if we use it.  For example, DROP privileges, etc.

 That seems like a matter for a separate patch.  Looking this over, I
 found I'd created a query that can never get used, so please find
 enclosed the next version of the patch :)

 Could someone please commit this? :)

Eh... was there some reason you didn't add it to the CommitFest app?
Because that's what I work from.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a new problem in MERGE

2010-11-21 Thread Robert Haas
On Sun, Nov 14, 2010 at 2:41 PM, Greg Smith g...@2ndquadrant.com wrote:
 Boxuan Zhai wrote:

 I have plan to fix the above two bugs together. (in fact, I have already
 started coding in merge_v202 edition). My question is how should I make my
 update be consistent with yours. Is it possible for you to give me an
 edition that I can work on?

 I just got this reconciled with HEAD again.  There have been two changes I
 made in the code you'll eventually want in your working copy:

Boxuan, are you still working on this patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-11-21 Thread David Fetter
On Sun, Nov 21, 2010 at 03:36:58PM -0500, Robert Haas wrote:
 On Sun, Nov 21, 2010 at 1:07 PM, David Fetter da...@fetter.org wrote:
  On Fri, Oct 29, 2010 at 08:33:00AM -0700, David Fetter wrote:
  On Tue, Oct 26, 2010 at 11:55:07AM +0900, Itagaki Takahiro wrote:
   On Tue, Oct 26, 2010 at 11:34 AM, David Fetter da...@fetter.org wrote:
Do we need to 'add' it?
Possibly.  My understanding is that it couldn't really replace it.
  
   Ah, I see.  I was wrong.  We can have modification privileges for
   views even if they have no INSTEAD OF triggers.
 
  Right.
 
   So, I think your original patch is the best solution.  We could use
   has_table_privilege() additionally, but we need to consider any
   other places if we use it.  For example, DROP privileges, etc.
 
  That seems like a matter for a separate patch.  Looking this over, I
  found I'd created a query that can never get used, so please find
  enclosed the next version of the patch :)
 
  Could someone please commit this? :)
 
 Eh... was there some reason you didn't add it to the CommitFest app?

I forgot.

 Because that's what I work from.

It's pretty trivial, but I don't feel comfortable adding it
after the close. :/

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-21 Thread Robert Haas
On Wed, Jul 7, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Rajanikant Chirmade rajanikant.chirm...@enterprisedb.com writes:
 Every identifier is downcase  truncated by function
 downcase_truncate_identifier()
 before using it.

 But since the function downcase_truncate_identifier() is not
 multibyte-charecter aware,
 it is not able to downcase some of special charecters in identifier like
 my_SchemÄ.

 IIRC this is intentional.  Please consult the archives for previous
 discussions.

Why would this be intentional?

One concern I have about this approach is that I am guessing that the
current implementation of str_tolower() is a lot slower than the
current implementation of downcase_truncate_identifier().  It would be
nice to have an implementation that is capable of handling wide
characters but doesn't actually incur the speed penalty unless a wide
character is actually present.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-11-21 Thread Andrew Dunstan



On 11/21/2010 12:31 AM, Terry Laurenzo wrote:


I copied the 5 sample documents from json.org http://json.org's 
example section for these tests.  These are loaded into a table with a 
varchar column 1000 times each (so the test table has 5000 rows in 
it).  In all situations, the binary encoding was smaller than the 
normalized text form (between 9 and 23% smaller).  I think there are 
cases where the binary form will be larger than the corresponding text 
form, but I don't think they would be very common.




Is that a pre-toast or post-toast comparison?

Even if it's post-toast, that doesn't seem like enough of a saving to 
convince me that simply storing as text, just as we do for XML, isn't a 
sensible way to go, especially when the cost of reproducing the text for 
delivery to clients  (including, say, pg_dump) is likely to be quite high.


cheers

andrew


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jul 7, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 IIRC this is intentional.  Please consult the archives for previous
 discussions.

 Why would this be intentional?

Well, it's intentional for lack of any infrastructure that would allow
a more spec-compliant approach.  As you say, calling str_tolower here
is probably a non-starter for performance reasons.  Another big problem
is that str_tolower produces a locale-specific downcasing conversion.
This (a) is going to create portability headaches of the first magnitude,
and (b) is not really an advance in terms of spec compliance.  The SQL
spec says that identifier case folding should be done according to the
Unicode standard, but it's not safe to assume that any random
platform-specific locale is going to act that way.  A specific example
of a locale that is known to NOT behave acceptably is Turkish: they have
weird ideas about i versus I, which in fact broke things back when we
used to use tolower for this purpose.  See the archives from early 2004,
and in particular commit 59f9a0b9df0d224bb62ff8ec5b65e0b187655742, which
removed the exact same logic (though not wide-character-aware) that this
patch proposes to put back.

I think the given patch can be rejected out of hand.  If the OP has any
ideas about doing non-locale-dependent case folding at an acceptable
speed, I'm happy to listen.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER OBJECT any_name SET SCHEMA name

2010-11-21 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 On Sat, Nov 20, 2010 at 11:23 PM, Robert Haas robertmh...@gmail.com wrote:
 Ah, nuts.  I see now there's a v7.  Never mind...

 OK.  I looked at the right version, now.  Hopefully.

Yeah, that was the most recent one and I linked it in the commit fest
application. Given the very fast feedback I got, there has been a lot of
activity and patches versions produced, so that's easy to get confused.

 It seems we have no regression tests at all for any of the existing
 SET SCHEMA commands.  This seems like a good time to correct that
 oversight, and also add some for the new commands you're adding here.

Yeah, it's time for me to have a look at regression tests :)

Please find attached set_schema.v8.patch with tests for the added
commands in the patch.

 (It might be helpful to submit the regression tests for the existing
 commands as a separate patch.) Also, you're missing psql tab
 completion support, which would be nice to have.

Do you still want me to prepare another patch for adding in the tests
the set schema variants that already existed but are not yet covered?
Which are the one you did spot, btw?

Completion support for psql. Isn't that stepping on David's toes? :)
I'll see about that later if needed, maybe sometime tomorrow…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/doc/src/sgml/ref/alter_conversion.sgml
--- b/doc/src/sgml/ref/alter_conversion.sgml
***
*** 23,28  PostgreSQL documentation
--- 23,29 
  synopsis
  ALTER CONVERSION replaceablename/replaceable RENAME TO replaceablenew_name/replaceable
  ALTER CONVERSION replaceablename/replaceable OWNER TO replaceablenew_owner/replaceable
+ ALTER CONVERSION replaceablename/replaceable SET SCHEMA replaceablenew_schema/replaceable
  /synopsis
   /refsynopsisdiv

***
*** 75,80  ALTER CONVERSION replaceablename/replaceable OWNER TO replaceablenew_owner
--- 76,90 
   /para
  /listitem
 /varlistentry
+ 
+varlistentry
+ termreplaceable class=parameternew_schema/replaceable/term
+ listitem
+  para
+   The new schema of the conversion.
+  /para
+ /listitem
+/varlistentry
/variablelist
   /refsect1
  
*** a/doc/src/sgml/ref/alter_opclass.sgml
--- b/doc/src/sgml/ref/alter_opclass.sgml
***
*** 23,28  PostgreSQL documentation
--- 23,29 
  synopsis
  ALTER OPERATOR CLASS replaceablename/replaceable USING replaceable class=parameterindex_method/replaceable RENAME TO replaceablenew_name/replaceable
  ALTER OPERATOR CLASS replaceablename/replaceable USING replaceable class=parameterindex_method/replaceable OWNER TO replaceablenew_owner/replaceable
+ ALTER OPERATOR CLASS replaceablename/replaceable USING replaceable class=parameterindex_method/replaceable SET SCHEMA replaceablenew_schema/replaceable
  /synopsis
   /refsynopsisdiv

***
*** 85,90  ALTER OPERATOR CLASS replaceablename/replaceable USING replaceable class=p
--- 86,100 
   /para
  /listitem
 /varlistentry
+ 
+varlistentry
+ termreplaceable class=parameternew_schema/replaceable/term
+ listitem
+  para
+   The new schema for the operator class.
+  /para
+ /listitem
+/varlistentry
   /variablelist
   /refsect1
  
*** a/doc/src/sgml/ref/alter_operator.sgml
--- b/doc/src/sgml/ref/alter_operator.sgml
***
*** 22,27  PostgreSQL documentation
--- 22,28 
   refsynopsisdiv
  synopsis
  ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/replaceable | NONE } , { replaceableright_type/replaceable | NONE } ) OWNER TO replaceablenew_owner/replaceable
+ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/replaceable | NONE } , { replaceableright_type/replaceable | NONE } ) SET SCHEMA replaceablenew_schema/replaceable
  /synopsis
   /refsynopsisdiv

***
*** 85,90  ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla
--- 86,100 
   /para
  /listitem
 /varlistentry
+ 
+varlistentry
+ termreplaceable class=parameternew_schema/replaceable/term
+ listitem
+  para
+   The new schema of the operator.
+  /para
+ /listitem
+/varlistentry
/variablelist
   /refsect1
  
*** a/doc/src/sgml/ref/alter_opfamily.sgml
--- b/doc/src/sgml/ref/alter_opfamily.sgml
***
*** 31,36  ALTER OPERATOR FAMILY replaceablename/replaceable USING replaceable class=
--- 31,37 
} [, ... ]
  ALTER OPERATOR FAMILY replaceablename/replaceable USING replaceable class=parameterindex_method/replaceable RENAME TO replaceablenew_name/replaceable
  ALTER OPERATOR FAMILY replaceablename/replaceable USING replaceable class=parameterindex_method/replaceable OWNER TO replaceablenew_owner/replaceable
+ ALTER OPERATOR FAMILY replaceablename/replaceable USING replaceable class=parameterindex_method/replaceable SET SCHEMA 

[HACKERS] Experimental JDBC driver

2010-11-21 Thread Radosław Smogura
Hello,

I last time added some features to JDBC driver meking in some way experimental 
driver, it could be quite usefull, but if it's possible I would like to ask 
for some reviews and tests. Maybe some ideas from this will be handly. 
Beacause the chageset is huge, I don't give patch set, but only URL where You 
can find sources and compiled binary 
http://www.rsmogura.net/pgsql/ 

(drivers are called pgjdbc_exp).

Below is some info about changes, in TODO and TO_THINK is set of free ideas 
about this.

=== JDBC 4 Exceptions ===
1. Driver can throw JDBC 4 exceptions.
2. The PSQLException is now interface, to keep as far as possible backward 
compatibility, read TO_THINK
3. TODO: In many static methods still exists 
PgSqlExceptionFactory.getDefault() 
this should be removed.
4. TODO: Add functionality to allow user to choose between throwin 
PSQL*Exceptions, or plain SQL*Exception.
5. TODO: Check if decision block about exception type is fully proper and 
there is no missign states, eventualy try to decide to throw Trnasient / Non 
Transient exceptions.
6. Currently we throw only Non Transient exception, as any error in server 
only allows to commit or rollback transaction, non future data read or update 
is possible.
7. TO_THINK: Change PSQLException to interface should make no differences when 
client application will be fully rebuilded, but if we have some 3rd party 
binary library its execution can be break (different VM opcode for calling 
methods). Because we have exception factory we can
revoke PSQLException to class, create PSQLExceptionInterface and add 
additional connection parameter 
useOldException default true - if true then PSQLException will be thrown on 
error, false SQLException implementing PSQLExceptionInterface. 

=== Binary Read and Write ===
Thanks to Mikko for some concepts from his patch (especialy about integer/float 
dates).
1.Few basic data types has been written (with server types)
 - read: bool, double, int, short, decimal, varchar, text; date, timestamp 
(with / without time zone)
 - write: int, decimal, date, timestamp
 - casting from server type to Java type is currently supported for few  
classes and types, need to check this step by step with JDBC spec
 - dates / times are limited to integer dates server, currently
2.The reader / writer is configured by connection basis with factories so if in 
future PostgreSQL will change the binary representation only the appropriate 
reader / writer implementation will change.
3. TODO: Use simillar to binary factory machanism to read / write character 
protocol.
4. TODO: Add binary modes for PGType classes.
5. TODO: Coopy all statics converters form PGType to factory, and mark methods 
as deprecated (statics disallow to use custom exception factories). 
5a. Subtask create casting facility extracting PGTypes cast methods to this 
facility, connect it with current if..else trees in ResultSet.get()
6. TODO: Plugability model to internal reading and writing other server types 
in binary / text mode.
7. Add support for UUID Java class and server type.
8. TODO: Binary reader always return byte[], think about directly writting to 
stream (less GC overhead).
9.Hack JDK MutuableInteger, less GC overhead and more preformance when parsing 
DECIMAL.
10. TODO: Make additional test about timestamps (dates, etc.) with / without 
time zone when server is in different timezone then driver, and what will hapen 
on TZ change in client. Subject to ask on pg-hackers.
11. TO_THINK: It's good point to think about changing OID to JDBC RowId, just 
sugar for sugar.
12. No describe or lazy decision to read given value as binary or character.
13. If you would like to test binary for all JUnit test call 
ant -DbinaryTransfer=true 

=== Other changes ===
1. setStatementTimeout - uses Java Timer. It's not fully secure, as timer 
will abort statement when statement is executing, long Java processing, or 
some sub statements can break this soultion.

=== JDBC 4.1 ===
1. build.xml changes, and basic implementation (with 
UnsupportedOperaionException) of some JDBC interfaces to use JDK 7 JDBC 4.1

Kind regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Greg Smith

Robert Haas wrote:

Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea.  For example, given
files F1 ... Fn with dirty pages needing checkpoint writes, we could
do the following: first, do any pending fsyncs for files not among F1
.. Fn; then, write all pages for F1 and fsync, write all pages for F2
and fsync, write all pages for F3 and fsync, etc.  This might seem
dumb because we're not really giving the OS a chance to write anything
out before we fsync, but think about the ext3 case where the whole
filesystem cache gets flushed anyway.


I'm not horribly interested in optimizing for the ext3 case per se, as I 
consider that filesystem fundamentally broken from the perspective of 
its ability to deliver low-latency here.  I wouldn't want a patch that 
improved behavior on filesystem with granular fsync to make the ext3 
situation worst.  That's as much as I'd want design to lean toward 
considering its quirks.  Jeff Janes made a case downthread for why not 
make it the admin/OS's job to worry about this?  In cases where there 
is a reasonable solution available, in the form of switch to XFS or 
ext4, I'm happy to take that approach.


Let me throw some numbers out to give a better idea of the shape and 
magnitude of the problem case I've been working on here.  In the 
situation that leads that the near hour-long sync phase I've seen, 
checkpoints will start with about a 3GB backlog of data in the kernel 
write cache to deal with.  That's about 4% of RAM, just under the 5% 
threshold set by dirty_background_ratio.  Whether or not the 256MB write 
cache on the controller is also filled is a relatively minor detail I 
can't monitor easily.  The checkpoint itself?  250MB each time. 

This proportion is why I didn't think to follow the alternate path of 
worrying about spacing the write and fsync calls out differently.  I 
shrunk shared_buffers down to make the actual checkpoints smaller, which 
helped to some degree; that's what got them down to smaller than the 
RAID cache size.  But the amount of data cached by the operating system 
is the real driver of total sync time here.  Whether or not you include 
all of the writes from the checkpoint itself before you start calling 
fsync didn't actually matter very much; in the case I've been chasing, 
those are getting cached anyway.  The write storm from the fsync calls 
themselves forcing things out seems to be the driver on I/O spikes, 
which is why I started with spacing those out.


Writes go out at a rate of around 5MB/s, so clearing the 3GB backlog 
takes a minimum of 10 minutes of real time.  There are about 300 1GB 
relation files involved in the case I've been chasing.  This is where 
the 3 second delay number came from; 300 files, 3 seconds each, 900 
seconds = 15 minutes of sync spread.  You can turn that math around to 
figure out how much delay per relation you can afford while still 
keeping checkpoints to a planned end time, which isn't done in the patch 
I submitted yet.


Ultimately what I want to do here is some sort of smarter write-behind 
sync operation, perhaps with a LRU on relations with pending fsync 
requests.  The idea would be to sync relations that haven't been touched 
in a while in advance of the checkpoint even.  I think that's similar to 
the general idea Robert is suggesting here, to get some sync calls 
flowing before all of the checkpoint writes have happened.  I think that 
the final sync calls will need to get spread out regardless, and since 
doing that requires a fairly small amount of code too that's why we 
started with that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Martijn van Oosterhout
On Sun, Nov 21, 2010 at 04:54:00PM -0500, Greg Smith wrote:
 Ultimately what I want to do here is some sort of smarter write-behind  
 sync operation, perhaps with a LRU on relations with pending fsync  
 requests.  The idea would be to sync relations that haven't been touched  
 in a while in advance of the checkpoint even.  I think that's similar to  
 the general idea Robert is suggesting here, to get some sync calls  
 flowing before all of the checkpoint writes have happened.  I think that  
 the final sync calls will need to get spread out regardless, and since  
 doing that requires a fairly small amount of code too that's why we  
 started with that.

For a similar problem we had (kernel buffering too much) we had success
using the fadvise and madvise WONTNEED syscalls to force the data to
exit the cache much sooner than it would otherwise. This was on Linux
and it had the side-effect that the data was deleted from the kernel
cache, which we wanted, but probably isn't appropriate here.

There is also sync_file_range, but that's linux specific, although
close to what you want I think. It would allow you to work with blocks
smaller than 1GB.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] knngist - 0.8

2010-11-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 2010/11/12 Teodor Sigaev teo...@sigaev.ru:
 My variants informs GiST by SK_ORDER flags and consistentFn looks at
 strategy number (strategy numbers are different for different purposes).

 Yeah.  At ten thousand feet, I think the open design question here is
 to what extent it's OK to rely on the fact that the ORDER BY clauses
 we wish to optimize happen to look a lot like the WHERE clauses we
 already know how to optimize: namely, they're both binary opclauses of
 the form indexed-column op constant.  Your patch manages to
 reuse a LOT of existing machinery by shoving ordering expressions
 through the same code paths that quals take.  Code reuse is generally
 a good thing, but here's we're forming RestrictInfo and ScanKey
 objects out of things that are neither restrictions nor keys, which
 might lead to maintainability problems down the road.  I'd like to get
 some input from Tom on how he feels about that, and any alternatives
 he sees.

I haven't spent any time on this patch yet (hope to start looking at it
next week).  As for your specific question above, I don't have a big
problem with reusing ScanKey this way, but I do agree that using
RestrictInfo for this would be a crock.  ISTM what we ought to have is
just the ability to match PathKeys with expressions of the form
indexedcol op constant to an index.

I'm undecided about the big-picture question of how much extra
generality ought to be put into the system along with this patch.
The argument not to is that with no candidate uses of additional
generality on the horizon, it's a waste of time to design something
more general, because we'll probably get it wrong anyway.  I'm not
a fan of designing APIs without use-cases in mind.  On the other hand,
there's an argument *for* doing something more general, which is
basically Polya's paradox: the more general problem may be easier to
solve.  To support that argument, we'd need a design that is clearly
cleaner than bolting KNNGIST on according to the current patch.
AIUI we don't have that at the moment, but I still think it's worth
spending a bit more time looking for one.

 It seems to me that our concept of ScanDirection is really woefully
 under-expressive.  For example, given:

 CREATE TABLE foo (
 id integer NOT NULL,
 name character varying NOT NULL,
 PRIMARY KEY (id)
 );

 We use the index for the first of these but not the second:

 select * from foo order by id nulls last;
 select * from foo order by id nulls first;

 In an ideal world, we'd like to handle the second one by finding the
 first non-NULL entry in the index, scanning away from the NULLs, and
 then, when we run off the end, looping back around to spit out the
 NULL entries.

This example leaves me totally cold, not least because it assumes a
specific storage implementation for nulls in an index.  It is also,
I think, misunderstanding what ScanDirection is for.  That's only
intended to allow executor plans to be run forward and then backed up
in the same fashion that fetching backwards from a cursor would do;
which is not a btree-specific concept, indeed not even index-specific.
If there is sufficient interest in doing what you suggest, what we'd
want to do is pass the PathKey representation to the index and let the
index AM figure out what it has to do to produce that sort order.  But
that is way way down my priority list.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?

2010-11-21 Thread Vaibhav Kaushal
Thanks for that informative article. :) Taught me a few new points about
Eclipse :)

On Sun, Nov 21, 2010 at 2:29 PM, Leonardo Francalanci m_li...@yahoo.itwrote:

  Here's my single biggest tip for newcomers to the Postgres  source:
  if you don't use ctags, glimpse, or some other tool that can  quickly
  show you all references to a given identifier, go out and get  one.
  It's one of the easiest ways to learn about  things.


 I guess Eclipse is the best tool for the newcomer as it gives
 you all the references very easily and shows the functions/structs
 declaration when you pass with the mouse over one...

 http://wiki.postgresql.org/wiki/Working_with_Eclipse






Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Andres Freund
On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:
 For a similar problem we had (kernel buffering too much) we had success
 using the fadvise and madvise WONTNEED syscalls to force the data to
 exit the cache much sooner than it would otherwise. This was on Linux
 and it had the side-effect that the data was deleted from the kernel
 cache, which we wanted, but probably isn't appropriate here.
Yep, works fine. Although it has the issue that the data will get read again if 
archiving/SR is enabled.

 There is also sync_file_range, but that's linux specific, although
 close to what you want I think. It would allow you to work with blocks
 smaller than 1GB.
Unfortunately that puts the data under quite high write-out pressure inside 
the kernel - which is not what you actually want because it limits reordering 
and such significantly.

It would be nicer if you could get a mix of both semantics (looking at it, 
depending on the approach that seems to be about a 10 line patch to the 
kernel). I.e. indicate that you want to write the pages soonish, but don't put 
it on the head of the writeout queue.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Josh Berkus
On 11/20/10 6:11 PM, Jeff Janes wrote:
 True, but I think that changing these from their defaults is not
 considered to be a dark art reserved for kernel hackers, i.e they are
 something that sysadmins are expected to tweak to suite their work
 load, just like the shmmax and such. 

I disagree.  Linux kernel hackers know about these kinds of parameters,
and I suppose that Linux performance experts do.  But very few
sysadmins, in my experience, have any idea.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 4:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jul 7, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 IIRC this is intentional.  Please consult the archives for previous
 discussions.

 Why would this be intentional?

 Well, it's intentional for lack of any infrastructure that would allow
 a more spec-compliant approach.  As you say, calling str_tolower here
 is probably a non-starter for performance reasons.  Another big problem
 is that str_tolower produces a locale-specific downcasing conversion.
 This (a) is going to create portability headaches of the first magnitude,
 and (b) is not really an advance in terms of spec compliance.  The SQL
 spec says that identifier case folding should be done according to the
 Unicode standard, but it's not safe to assume that any random
 platform-specific locale is going to act that way.  A specific example
 of a locale that is known to NOT behave acceptably is Turkish: they have
 weird ideas about i versus I, which in fact broke things back when we
 used to use tolower for this purpose.  See the archives from early 2004,
 and in particular commit 59f9a0b9df0d224bb62ff8ec5b65e0b187655742, which
 removed the exact same logic (though not wide-character-aware) that this
 patch proposes to put back.

 I think the given patch can be rejected out of hand.  If the OP has any
 ideas about doing non-locale-dependent case folding at an acceptable
 speed, I'm happy to listen.

I think that's fair.  It actually doesn't seem like it should be that
hard if we knew that the server encoding were UTF8 - it's just a big
translation table somewhere, no?  We use heuristics to copy as many
characters as possible without detailed examination and consult the
lookup table for the rest.  However, that's not very practical in the
face of more than one encoding that must be handled.  What sort of
infrastructure would actually be useful for dealing with this problem?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 5:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I haven't spent any time on this patch yet (hope to start looking at it
 next week).  As for your specific question above, I don't have a big
 problem with reusing ScanKey this way, but I do agree that using
 RestrictInfo for this would be a crock.  ISTM what we ought to have is
 just the ability to match PathKeys with expressions of the form
 indexedcol op constant to an index.

That doesn't seem very hard on its face.  The trick is what to do with
that information once you've got it.  As far as I can tell, you need
to drill some kind of hole that lets you pass additional details
about the desired sort order to the index AM.  What I'd sort of like
to be able to do is throw the PathKeys at the index AM and say you
want these?.  Short of that, we're probably going to have to resign
ourselves to the core code basically knowing exactly what the
capabilities of KNNGIST are, making the index API pretty porous - not
that it already isn't.  There's really nothing special about the
subset of the problem space KNNGIST happens to attack except that it
makes the GIS guys drool; the next problem someone wants to attack in
this area is as likely as not to look completely different.

 This example leaves me totally cold, not least because it assumes a
 specific storage implementation for nulls in an index.  It is also,
 I think, misunderstanding what ScanDirection is for.  That's only
 intended to allow executor plans to be run forward and then backed up
 in the same fashion that fetching backwards from a cursor would do;
 which is not a btree-specific concept, indeed not even index-specific.

Ah, OK.

 If there is sufficient interest in doing what you suggest, what we'd
 want to do is pass the PathKey representation to the index and let the
 index AM figure out what it has to do to produce that sort order.  But
 that is way way down my priority list.

Yeah, this is basically what I'm wondering whether we can reasonably
do for KNNGIST; with hopes of later reuse.  But it may be unworkable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-21 Thread Andrew Dunstan



On 11/21/2010 06:09 PM, Robert Haas wrote:

I think that's fair.  It actually doesn't seem like it should be that
hard if we knew that the server encoding were UTF8 - it's just a big
translation table somewhere, no?


No, it's far more complex. See for example 
http://unicode.org/reports/tr21/tr21-3.html, which says:


   There are a number of complications to case mappings that occur once
   the repertoire of characters is expanded beyond ASCII.

   * Because of the inclusion of certain composite characters for
 compatibility, such as 01F1 DZ /capital dz/, there is a
 third case, called /titlecase/, which is used where the first
 letter of a word is to be capitalized (e.g. Titlecase, vs.
 UPPERCASE, or lowercase).
 o For example, the title case of the example character is
   01F2 Dz /capital d with small z/.
   * Case mappings may produce strings of different length than the
 original.
 o For example, the German character 00DF ß /small letter
   sharp s/ expands when uppercased to the sequence of two
   characters SS. This also occurs where there is no
   precomposed character corresponding to a case mapping,
   such as with 0149 'n /latin small letter n preceded by
   apostrophe./
   * Characters may also have different case mappings, depending on
 the context.
 o For example, 03A3 ? /capital sigma/ lowercases to 03C3
   ? /small sigma/ if it is followed by another letter,
   but lowercases to 03C2 ? /small final sigma/ if it is not.
   * Characters may have case mappings that depend on the locale.
 o For example, in Turkish the letter 0049 I /capital
   letter i/ lowercases to 0131 ? /small dotless i/.
   * Case mappings are not, in general, reversible.
 o For example, once the string McGowan has been
   uppercased, lowercased or titlecased, the original
   cannot be recovered by applying another uppercase,
   lowercase, or titlecase operation.


cheers

andrew





Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 6:22 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 11/21/2010 06:09 PM, Robert Haas wrote:

 I think that's fair.  It actually doesn't seem like it should be that
 hard if we knew that the server encoding were UTF8 - it's just a big
 translation table somewhere, no?

 No, it's far more complex. See for example
 http://unicode.org/reports/tr21/tr21-3.html, which says:

 There are a number of complications to case mappings that occur once the
 repertoire of characters is expanded beyond ASCII.

 Because of the inclusion of certain composite characters for compatibility,
 such as 01F1 DZ capital dz, there is a third case, called titlecase, which
 is used where the first letter of a word is to be capitalized (e.g.
 Titlecase, vs. UPPERCASE, or lowercase).

 For example, the title case of the example character is 01F2 Dz capital d
 with small z.

 Case mappings may produce strings of different length than the original.

 For example, the German character 00DF ß small letter sharp s expands when
 uppercased to the sequence of two characters SS. This also occurs where
 there is no precomposed character corresponding to a case mapping, such as
 with 0149 ʼn latin small letter n preceded by apostrophe.

 Characters may also have different case mappings, depending on the context.

 For example, 03A3 Σ capital sigma lowercases to 03C3 σ small sigma if it
 is followed by another letter, but lowercases to 03C2 ς small final sigma
 if it is not.

 Characters may have case mappings that depend on the locale.

 For example, in Turkish the letter 0049 I capital letter i lowercases to
 0131 ı small dotless i.

 Case mappings are not, in general, reversible.

 For example, once the string McGowan has been uppercased, lowercased or
 titlecased, the original cannot be recovered by applying another uppercase,
 lowercase, or titlecase operation.

Yikes.  So what do people do about this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-11-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 That doesn't seem very hard on its face.  The trick is what to do with
 that information once you've got it.  As far as I can tell, you need
 to drill some kind of hole that lets you pass additional details
 about the desired sort order to the index AM.

We clearly need to add additional information to IndexScan plan nodes
to tell the index AM which sort order is required.  Up to now, an
indexscan has only had one possible resultant sort order (two if you
count backwards scan, but as I said I don't think generalizing that
particular feature is the way to approach this).  I would imagine
that the best way to handle that is to add a PathKey list or something
equivalent to it, and add that to the arguments passed to ambeginscan.

The other issue is how the planner can figure out what the possible
orderings are when it's considering an index.  You seem to be
contemplating adding a new index AM function that the planner would call
at the right point; but I'm not sure that that adds much of anything,
because the index AM can't have hard-wired behavior either.  We really
have to have enough information in the system catalog entries about an
opclass to allow the possible orderings to be determined.  Given that,
I think it makes more sense for the core planner to know what to do than
to put possibly duplicative code into multiple AMs.

I guess a third alternative would be to create per-opclass hook
functions for the planner to call, but I'm not thrilled with that
idea; it would still be largely duplicative code, and in a lot more
places.  I think it would also bind our hands with respect to making
internal planner changes in future, because the data structures
representing pathkeys would be pretty well locked down by such a choice.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-21 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 11/21/2010 06:09 PM, Robert Haas wrote:
 I think that's fair.  It actually doesn't seem like it should be that
 hard if we knew that the server encoding were UTF8 - it's just a big
 translation table somewhere, no?

 No, it's far more complex. See for example 
 http://unicode.org/reports/tr21/tr21-3.html, which says:

Yeah.  I'm actually not sure that the SQL committee has thought very
hard about this, because the spec is worded as though they think that
Unicode case normalization is all they have to say to uniquely define
what to do.  The Unicode guys recognize that case mapping is
locale-specific, which puts us right back at square one.  But leaving
spec compliance aside, we know from bitter experience that we cannot use
a definition that lets the Turkish locale fool with the mapping of i/I.
I suspect that locale-dependent mappings of any other characters are
just as bad, we simply haven't had enough users burnt by such cases to
have an institutional memory of it.  But for example do you really think
it's a good idea if pg_dump and reload into a DB with a different locale
results in changing the normalized form of SQL identifiers?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 4:54 PM, Greg Smith g...@2ndquadrant.com wrote:
 Let me throw some numbers out [...]

Interesting.

 Ultimately what I want to do here is some sort of smarter write-behind sync
 operation, perhaps with a LRU on relations with pending fsync requests.  The
 idea would be to sync relations that haven't been touched in a while in
 advance of the checkpoint even.  I think that's similar to the general idea
 Robert is suggesting here, to get some sync calls flowing before all of the
 checkpoint writes have happened.  I think that the final sync calls will
 need to get spread out regardless, and since doing that requires a fairly
 small amount of code too that's why we started with that.

Doing some kind of background fsyinc-ing might indeed be sensible, but
I agree that's secondary to trying to spread out the fsyncs during the
checkpoint itself.  I guess the question is what we can do there
sensibly without an unreasonable amount of new code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 6:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On 11/21/2010 06:09 PM, Robert Haas wrote:
 I think that's fair.  It actually doesn't seem like it should be that
 hard if we knew that the server encoding were UTF8 - it's just a big
 translation table somewhere, no?

 No, it's far more complex. See for example
 http://unicode.org/reports/tr21/tr21-3.html, which says:

 Yeah.  I'm actually not sure that the SQL committee has thought very
 hard about this, because the spec is worded as though they think that
 Unicode case normalization is all they have to say to uniquely define
 what to do.  The Unicode guys recognize that case mapping is
 locale-specific, which puts us right back at square one.  But leaving
 spec compliance aside, we know from bitter experience that we cannot use
 a definition that lets the Turkish locale fool with the mapping of i/I.
 I suspect that locale-dependent mappings of any other characters are
 just as bad, we simply haven't had enough users burnt by such cases to
 have an institutional memory of it.  But for example do you really think
 it's a good idea if pg_dump and reload into a DB with a different locale
 results in changing the normalized form of SQL identifiers?

No, especially if it results in queries that used to work breaking,
which it well could.  But I'm not sure where to go with it from there,
beyond throwing up my hands.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 4:05 PM, David Fetter da...@fetter.org wrote:
  Could someone please commit this? :)

 Eh... was there some reason you didn't add it to the CommitFest app?

 I forgot.

A fair excuse.  :-)

 Because that's what I work from.

 It's pretty trivial, but I don't feel comfortable adding it
 after the close. :/

So add it to the next one, and we'll get it then if nobody picks it up sooner...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-11-21 Thread David Fetter
On Sun, Nov 21, 2010 at 07:09:08PM -0500, Robert Haas wrote:
 On Sun, Nov 21, 2010 at 4:05 PM, David Fetter da...@fetter.org wrote:
   Could someone please commit this? :)
 
  Eh... was there some reason you didn't add it to the CommitFest app?
 
  I forgot.
 
 A fair excuse.  :-)
 
  Because that's what I work from.
 
  It's pretty trivial, but I don't feel comfortable adding it
  after the close. :/
 
 So add it to the next one, and we'll get it then if nobody picks it up 
 sooner...

Given its small and isolated nature, I was hoping we could get this in
sooner rather than later.  As I understand it, CFs are there to review
patches that take significant effort for even a committer to
understand, so this doesn't really fit that model.

Cheers,
David (refraining from mentioning anything about the time taken today
to discuss this vs. the time it would have taken to push the thing)
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Nov 21, 2010 at 6:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 spec compliance aside, we know from bitter experience that we cannot use
 a definition that lets the Turkish locale fool with the mapping of i/I.
 I suspect that locale-dependent mappings of any other characters are
 just as bad, we simply haven't had enough users burnt by such cases to
 have an institutional memory of it.  But for example do you really think
 it's a good idea if pg_dump and reload into a DB with a different locale
 results in changing the normalized form of SQL identifiers?

 No, especially if it results in queries that used to work breaking,
 which it well could.  But I'm not sure where to go with it from there,
 beyond throwing up my hands.

Well, that's why there's been no movement on this since 2004 :-(.  The
amount of work needed for a better solution seems far out of proportion
to the benefits.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-11-21 Thread Itagaki Takahiro
On Wed, Oct 20, 2010 at 01:36, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Ah yes, thinking it's an easy patch is not helping. Please find attached
 a revised version of it.

I checked cfparser.v2.patch.

It exports the static parseRecoveryCommandFileLine() in xlog.c
as the global cfParseOneLine() in cfparser.c without modification.

It generates one warning, but it can be easily fixed.
  cfparser.c:34: warning: no previous prototype for 'cfParseOneLine'

Some discussions about the patch:

* Is cf the best name for the prefix? Less abbreviated forms might
  be less confusable. Personally, I prefer conf.

* Can we export ParseConfigFile() in guc-file.l rather than
  parseRecoveryCommandFileLine()? It can solve the issue that unquoted
  parameter values in recovery.conf are not recognized. Even if we
  won't merge them, just allowing unquoted values would be useful.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql: Add \dL to show languages

2010-11-21 Thread Josh Kupershmidt
Hi all,

I'd like to revive Fernando Ike's patch implementing the \dL command
for psql to list available languages, last version here:
  http://archives.postgresql.org/pgsql-hackers/2009-07/msg01092.php

The original patch produced columns Name, Owner, Procedural
Language, Trusted, Call Handler, and Validator. I propose
simplifying the non-verbose output of \dL to look like this:

Name | Owner | Trusted
-+---+-
 plperl  | josh  | t
 plpgsql | josh  | t
 plpythonu   | josh  | f
(3 rows)

since the rest of the columns in the original patch seem like they
would be distracting noise the majority of the time[2]. I've kept most
of the original columns in the verbose output.

Tom Lane and Peter Eisentraut gave feedback on the original patch. I
think these concerns raised by Peter should now be addressed:

 1) This is obviously wrong:

 CASE WHEN l.lanispl = 't' THEN 'Trusted' WHEN l.lanispl = 'f' THEN 
 'Untrusted' END

I ripped out this Procedural Language column[1].

 2) It may be better to use lanispl to determine whether a language is a system
 object or not.  It's kind of obscure, but pg_dump does it that way, so it'd at
 least be consistent.

I added a System Object column in the verbose output with this information.

 3) Your code does processSQLNamePattern(), but neither the help nor the
 documentation mention that \dL accepts a pattern.  A pattern for listing
 languages might be overkill, but at least the documentation needs to match
 what the code attempts to do.

I added a note to the psql-ref.sgml documentation that \dL accepts a
pattern. I agree it's probably overkill to support pattern matching
when most folks will have maybe 1-3 additional languages installed,
but it's easy enough to add in, and similar psql functions support
patterns as well.

 4) Instead of LEFT JOIN pg_catalog.pg_proc p on l.lanplcallfoid = p.oid etc,
 just cast the oid field to regprocedure.  See examples elsewhere in
 describe.c.

Done, though I didn't see anything else in describe.c using casts to
regprocedure. Maybe there's a better way?

I've also fixed the tab-completion for \dL's pattern input. I haven't
yet test backwards compatibility with older server versions, though it
looks like this patch should work fine by not querying for lanowner
on 8.2 and earlier; I didn't see any other columns missing in
pg_language back to at least 8.1.

Josh

--
[1] I'm not sure what Fernando intended the original Procedural
Language column to be, but that column displayed Trusted or
Untrusted in addition to the Trusted column. Maybe this was a typo
in the patch? In any event, I don't think it's useful to have a
separate Name and Procedural Language column. If we did want to
include a Procedural Language column in addition to the Name, I'm not
sure offhand where to get this information, e.g. how to get the string
PL/pgSQL given pg_language.lanname = 'plpgsql'

[2]  For example, the command droplang --list only prints out Name
and Trusted? columns.


psql_languages.v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 7:17 PM, David Fetter da...@fetter.org wrote:
 On Sun, Nov 21, 2010 at 07:09:08PM -0500, Robert Haas wrote:
 On Sun, Nov 21, 2010 at 4:05 PM, David Fetter da...@fetter.org wrote:
   Could someone please commit this? :)
 
  Eh... was there some reason you didn't add it to the CommitFest app?
 
  I forgot.

 A fair excuse.  :-)

  Because that's what I work from.
 
  It's pretty trivial, but I don't feel comfortable adding it
  after the close. :/

 So add it to the next one, and we'll get it then if nobody picks it up 
 sooner...

 Given its small and isolated nature, I was hoping we could get this in
 sooner rather than later.  As I understand it, CFs are there to review
 patches that take significant effort for even a committer to
 understand, so this doesn't really fit that model.

Well, then add it to this one if you think that's more appropriate.
My point is simple: I review patches because they are in the CF queue.
 Your point seems to be: put mine ahead of the others, and review it
immediately.  Someone else may very well be willing to do that; I'm
not.

 David (refraining from mentioning anything about the time taken today
 to discuss this vs. the time it would have taken to push the thing)

Mention anything you want.  My guess is it would take me an hour.
You're certainly right that this discussion is a waste of time, but
possibly not for the reasons you are supposing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 8:10 PM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 On Wed, Oct 20, 2010 at 01:36, Dimitri Fontaine dimi...@2ndquadrant.fr 
 wrote:
 Ah yes, thinking it's an easy patch is not helping. Please find attached
 a revised version of it.

 I checked cfparser.v2.patch.

 It exports the static parseRecoveryCommandFileLine() in xlog.c
 as the global cfParseOneLine() in cfparser.c without modification.

 It generates one warning, but it can be easily fixed.
  cfparser.c:34: warning: no previous prototype for 'cfParseOneLine'

 Some discussions about the patch:

 * Is cf the best name for the prefix? Less abbreviated forms might
  be less confusable. Personally, I prefer conf.

 * Can we export ParseConfigFile() in guc-file.l rather than
  parseRecoveryCommandFileLine()? It can solve the issue that unquoted
  parameter values in recovery.conf are not recognized. Even if we
  won't merge them, just allowing unquoted values would be useful.

I'd really like to see postgresql.conf and recovery.conf parsing
merged, and I suspect, as Itagaki-san says, that postgresql.conf
parsing is the better model for any new code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-11-21 Thread David Fetter
On Sun, Nov 21, 2010 at 08:27:34PM -0500, Robert Haas wrote:
 On Sun, Nov 21, 2010 at 7:17 PM, David Fetter da...@fetter.org wrote:
  On Sun, Nov 21, 2010 at 07:09:08PM -0500, Robert Haas wrote:
  On Sun, Nov 21, 2010 at 4:05 PM, David Fetter da...@fetter.org wrote:
Could someone please commit this? :)
  
   Eh... was there some reason you didn't add it to the
   CommitFest app?
  
   I forgot.
 
  A fair excuse.  :-)
 
   Because that's what I work from.
  
   It's pretty trivial, but I don't feel comfortable adding it
   after the close. :/
 
  So add it to the next one, and we'll get it then if nobody picks
  it up sooner...
 
  Given its small and isolated nature, I was hoping we could get
  this in sooner rather than later.  As I understand it, CFs are
  there to review patches that take significant effort for even a
  committer to understand, so this doesn't really fit that model.
 
 Well, then add it to this one if you think that's more appropriate.

Done.  :)

 My point is simple: I review patches because they are in the CF
 queue.  Your point seems to be: put mine ahead of the others, and
 review it immediately.  Someone else may very well be willing to do
 that; I'm not.

Fair enough.

  David (refraining from mentioning anything about the time taken
  today to discuss this vs. the time it would have taken to push the
  thing)
 
 Mention anything you want.  My guess is it would take me an hour.
 You're certainly right that this discussion is a waste of time, but
 possibly not for the reasons you are supposing.

LOL!

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: Add \dL to show languages

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 8:18 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 I'd like to revive Fernando Ike's patch implementing the \dL command
 for psql to list available languages, last version here:
  http://archives.postgresql.org/pgsql-hackers/2009-07/msg01092.php

Please add this patch to the currently open CommitFest:

https://commitfest.postgresql.org/action/commitfest_view/open

And please also help with review of patches from the current CommitFest:

https://commitfest.postgresql.org/action/commitfest_view/inprogress

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instrument checkpoint sync calls

2010-11-21 Thread Robert Haas
On Mon, Nov 15, 2010 at 3:09 PM, Greg Smith g...@2ndquadrant.com wrote:
 So my task list is:

 0) Rebase against the HEAD that just code related to this touched today

 1) Assume that log_checkpoints is sufficient control over whether the timing
 overhead added is worth collecting, and therefore remove the half-baked idea
 of also wrapping with a compile-time option.

 2) Have the sync summary returned upwards, so it can be put onto the same
 line as the rest of the rest of the log_checkpoint info.

 All seems reasonable to me.  Will rev a new patch by tomorrow.

Did this get eaten by the email goblin, or you're still working on it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-21 Thread Robert Haas
On Fri, Nov 19, 2010 at 9:55 AM, Shigeru HANADA
han...@metrosystems.co.jp wrote:
 [ new SQL/MED patch ]

I can't help noticing that this patch adds 8,982 lines and removes
408, making it far larger any other patch I've ever seen on this list.
 And what that means is that committing all of this in one go is going
to be very, very difficult.  Now, on the plus side, as 9000+ line
patches go, this one looks pretty well-written, at least after ten
seconds of looking at it, which is great as far as it goes, but the
sheer size is still going to make it just about impossible for anyone
to review it effectively and have real confidence that the whole thing
is commit-quality.  To have a chance of getting a significant portion
of this into PostgreSQL 9.1, it really needs to be broken up into
INDEPENDENTLY COMMITTABLE SUB-PATCHES.

The key words here are independently committable.  Breaking up a
patch into sub-patches by directory, for example, is completely
useless - we're not, for example, going to commit the code first and
the docs separately.  Let me say that again - the ONLY useful way of
breaking up a patch is to divide it into pieces such that EACH piece,
by itself, would represent a credible commit.  Each piece should be
posted to a separate thread and a separate discussion should be had
about the merits and demerits of each one.  Each should have a
separate CommitFest entry and, ideally, a separate reviewer.  Of
course, it may not be possible to fully evaluate a given patch without
looking at the other ones, but the extent to which this is necessary
should be minimized; otherwise you haven't really broken it up
usefully.

Ultimately, we probably want and need to get this patch down to chunks
of less than 2000 lines each.  But for starters, it looks quite simple
to break this into three chunks: one for the PostgreSQL FDW, one for
the CSV FDW, and one for the core functionality.  I think that the CSV
FDW, like the PG FDW, should be a loadable module.  (I wonder if it
would be more sensible to name all the FDWs as fdw_foo rather than
foo_fdw, so that they alphabetize together, but I believe that Tom
has shot down similar suggestions in the past, so maybe it's not a
good idea after all.)  So let's do that and then see if we can find
anything that we can either simplify (so it takes fewer lines of code)
or pull out and commit separately (because, for example, it's some
kind of refactoring that is a good idea independently of this patch).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: Add \dL to show languages

2010-11-21 Thread Josh Kupershmidt
On Sun, Nov 21, 2010 at 8:39 PM, Robert Haas robertmh...@gmail.com wrote:
 Please add this patch to the currently open CommitFest:

Added to 2011-01.

 https://commitfest.postgresql.org/action/commitfest_view/open

 And please also help with review of patches from the current CommitFest:

 https://commitfest.postgresql.org/action/commitfest_view/inprogress

Yeah, I know I need to help out on reviews more. I signed on as an
additional reviewer for Thom Brown's Aditional docs index entries and
table sorting. I'll try to at least take a look at one or two more
without a Reviewer listed (maybe Tab completion in psql for triggers
on views or parallel pg_dump) as time permits, though I'm probably
not qualified to be the only reviewer for either of those.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: Add \dL to show languages

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 9:44 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 On Sun, Nov 21, 2010 at 8:39 PM, Robert Haas robertmh...@gmail.com wrote:
 Please add this patch to the currently open CommitFest:

 Added to 2011-01.

 https://commitfest.postgresql.org/action/commitfest_view/open

 And please also help with review of patches from the current CommitFest:

 https://commitfest.postgresql.org/action/commitfest_view/inprogress

 Yeah, I know I need to help out on reviews more. I signed on as an
 additional reviewer for Thom Brown's Aditional docs index entries and
 table sorting. I'll try to at least take a look at one or two more
 without a Reviewer listed (maybe Tab completion in psql for triggers
 on views or parallel pg_dump) as time permits, though I'm probably
 not qualified to be the only reviewer for either of those.

Anything you can do is great.  We always seem to have more patches
than reviewers

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation

2010-11-21 Thread Itagaki Takahiro
On Tue, Nov 16, 2010 at 04:42, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2010-11-15 at 11:34 +0100, Pavel Stehule wrote:
 I am checking a patch. I found a problem with initdb
 Ah, late night brain farts, it appears.  Here is a corrected version.

This version cannot be applied cleanly any more. Please update it.
(I think you don't have to include changes for catversion.h)
./src/backend/optimizer/util/plancat.c.rej
./src/backend/optimizer/plan/createplan.c.rej
./src/backend/optimizer/path/indxpath.c.rej
./src/include/catalog/catversion.h.rej

I didn't compile nor run the patched server, but I found a couple of
issues in the design and source code:

* COLLATE information must be explicitly passed by caller in the patch,
but we might forgot the handover when we write new codes. Is it possible
to pass it automatically, say using a global variable? If we could do so,
existing extensions might work with collation without rewritten.

* Did you check the regression test on Windows? We probably cannot use
en_US.utf8 on Windows. Also, some output of the test includes non-ASCII
characters. How will we test COLLATE feature on non-UTF8 databases?

[src/test/regress/sql/collate.sql]
+CREATE TABLE collate_test1 (
+a int,
+b text COLLATE en_US.utf8 not null
+);

* Did you see any performance regression by collation?
I found a bug in lc_collate_is_c(); result = 0 should be
checked before any other checks. SearchSysCache1() here
would be a performance regression.

[src/backend/utils/adt/pg_locale.c]
-lc_collate_is_c(void)
+lc_collate_is_c(Oid collation)
 {
...
+   tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collation));
...
HERE = if (result = 0)
return (bool) result;

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security hooks on object creation

2010-11-21 Thread Robert Haas
2010/11/12 KaiGai Kohei kai...@ak.jp.nec.com:
 (2010/11/12 19:34), KaiGai Kohei wrote:
 I revised my patch according to the prior suggestions.

 I'm sorry. I revised my patch, but not attached.

 Please see this attached one.

I'm satisfied with this approach, although I intend to change
InvokeObjectAccessHook0 to simply InvokeObjectAccessHook before
committing it; and correct your use of AttributeRelationId to
RelationRelationId for consistency with the rest of the code.  What
I'm not quite sure about is where to put the definitions you've added
to a new file utils/hooks.h; I don't feel that's a very appropriate
location.  It's tempting to put them in utils/acl.h just because this
is vaguely access-control related and that header is already included
in most of the right places, but maybe that's too much of a stretch;
or perhaps catalog/catalog.h, although that doesn't feel quite right
either.  If we are going to add a new header file, I still don't like
utils/hooks.h much - it's considerably more generic than can be
justified by its contents.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-21 Thread Itagaki Takahiro
On Mon, Nov 22, 2010 at 11:16, Robert Haas robertmh...@gmail.com wrote:
 To have a chance of getting a significant portion
 of this into PostgreSQL 9.1, it really needs to be broken up into
 INDEPENDENTLY COMMITTABLE SUB-PATCHES.

Did we discuss about syntax-only patch is not acceptable because
it makes the head broken state at the previous commit-fest?
I think that's why the patch becomes so large.

So, our guideline to submit a large patch would be:
  * Split patch into commitable sub-patches (2000 lines each),
  * But submit a series of patches at once.

Am I understanding correctly?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 10:14 PM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 On Mon, Nov 22, 2010 at 11:16, Robert Haas robertmh...@gmail.com wrote:
 To have a chance of getting a significant portion
 of this into PostgreSQL 9.1, it really needs to be broken up into
 INDEPENDENTLY COMMITTABLE SUB-PATCHES.

 Did we discuss about syntax-only patch is not acceptable because
 it makes the head broken state at the previous commit-fest?
 I think that's why the patch becomes so large.

Right, I remember that discussion.  Hopefully the distinction between
that conversation and this one is clear.

 So, our guideline to submit a large patch would be:
  * Split patch into commitable sub-patches (2000 lines each),

It's not a hard number - it's more important that the patch *make
sense* than what the exact line count is.  But I think that's a
reasonable guideline to shoot for.  Ideally, smaller still would
probably be even better, but sometimes it just can't be done.  Also,
note that pulling off small chunks is a valuable way to make progress.
 For example, if we notice that there's a 100-line refactoring in the
FDW patch that stands on its own, by all means let's pull it out and
commit it.

  * But submit a series of patches at once.

When necessary, yes.  Of course, the best thing is if you can make
them truly independent and submit the one after another.  Get one
committed, move on to the next.  But if you can't, then you can't.  In
this case, there's not much help for the fact that to decide whether
the FDW patch is a good idea you're probably going to at least want to
glance at the PGFDW and CSVFDW patches -- but it's possible we could
decide to commit the core support first, and then work on getting the
implementations committed afterwards, if we're confident that the
basic design is all right but more work is needed down in the details.

 Am I understanding correctly?

I think so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-11-21 Thread Andy Colson


I have done a bunch of benchmarking.  It was not easy to find consistent 
numbers, so I picked a job and ran the same thing over and over.

I'm running Slackware 13.1 on a desktop computer.

Linux storm 2.6.35.7-smp #1 SMP Sun Oct 10 21:43:07 CDT 2010 i686 AMD 
Athlon(tm) 7850 Dual-Core Processor AuthenticAMD GNU/Linux

Database on:
/dev/sda2 on /pub type ext4 (rw,noatime)



I started with stock, unpatched, pg 9.1, and ran pg_bench.  I used several scale's and 
always set the # connections at half the scale. (so scale 20 used 10 connections).  I ran 
all tests for 180 seconds.  autovacuum was always off, and I ran vacuum -z 
between each pg_bench.

each block of numbers has these columns: scale, test 1, test 2, test 3, avg
So the first line below: 6, 96, 105, 102, 101
means:
pg_becnh -i -s 6
pg_bench -c 3 -T 180
vacuum -z
pg_bench -c 3 -T 180
vacuum -z
pg_bench -c 3 -T 180

result times for the three runs 96, 105 and 102 seconds, with average 101 
seconds.

The LOGS test is importing 61+ million rows of apache logs.  Its a perl script, 
uses COPY over many many files.  Each file is commit separate.


checkpoint_segments = 7
shared_buffers = 512MB
effective_cache_size = 1024MB
autovacuum off


fsync on
synchronous_commit on
full_page_writes on
bgwriter_lru_maxpages 100
180 second tests

scale, test 1, test 2, test 3, avg
6, 96,  105, 102, 101
20, 120, 82, 76, 93
40, 73, 42, 43, 53
80, 50, 29, 35, 38


synchronous_commit off
6, 239, 676, 614, 510
20, 78, 47, 56, 60
40, 59, 35, 41, 45
80, 53, 30, 35, 39

LOGS: ~ 3,900 ins/sec (I didnt record this well, its sort of a guess)


synchronous_commit off
full_page_writes off
6, 1273, 1344, 1287, 1301
20, 1323, 1307, 1313, 1314
40, 1051, 872, 702, 875
80, 551, 206, 245, 334

LOGS  (got impatient and killed it)
Total rows: 20,719,095
Total Seconds: 5,279.74
Total ins/sec: 3,924.25


fsync off
synchronous_commit off
full_page_writes off
bgwriter_lru_maxpages 0
6, 3622, 2940, 2879, 3147
20, 2860, 2952, 2939, 2917
40, 2204, 2143, 2349, 2232
80, 1394, 1043, 1085, 1174

LOG (this is a full import)
Total rows: 61,467,489
Total Seconds: 1,240.93
Total ins/sec: 49,533.37

--- Apply unlogged patches and recompile, re-initdb ---
I patched pg_bench to run with either normal or unlogged tables

fsync on
synchronous_commit on
full_page_writes on
bgwriter_lru_maxpages 100
180 second tests

normal tables
6, 101, 102, 108, 103
20, 110, 71, 90, 90
40, 83, 45, 49, 59
80, 50, 34, 30, 38

LOGS (partial import)
Total rows: 24,754,871
Total Seconds: 6,058.03
Total ins/sec: 4,086.28

unlogged tables
6, 2966, 3047, 3007, 3006
20, 2767, 2515, 2708, 2663
40, 1933, 1311, 1464, 1569
80, 837, 552, 579, 656

LOGS (full import)
Total rows: 61,467,489
Total Seconds: 1,126.75
Total ins/sec: 54,552.60


After all this... there are too many numbers for me.  I have no idea what this 
means.

-Andy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-21 Thread Fujii Masao
On Sat, Nov 20, 2010 at 6:46 AM, Joachim Wieland j...@mcknight.de wrote:
 I still have the server, if you want me to debug anything or send a
 patch against 9.0.1 that gives more output, just let me know.

Per previous Simon's comment, the following information would be useful.
http://archives.postgresql.org/pgsql-general/2010-10/msg00154.php

--
If you suspect a bug in Hot Standby, please set
trace_recovery_messages = DEBUG2
in postgresql.conf and repeat the action

Always useful to know
* max_connections
* current number of sessions
* whether we have two phase commits happening
--

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-11-21 Thread Itagaki Takahiro
On Fri, Oct 15, 2010 at 03:40, Radosław Smogura
rsmog...@softperience.eu wrote:
 Regarding JDBC in the CF process -- other interfaces are handled
 there.  I haven't seen one patch this size for JDBC since I've been
 involved, let alone two competing patches to implement the same
 feature.  Small patches which can be quickly handled don't make sense
 to put into the process, but it seemed reasonable for these.

 In any way I'm sending this patch, and I will put this under Miscellaneous in
 CF. This cleared patch takes only 47k (in uncleared was some binary read
 classes) and about 50% it's big test case.

I changed the patch's topic to JDBC.
https://commitfest.postgresql.org/action/patch_view?id=399

Patch reviewers are still wanted.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers