date:20130611

On 06/11/2013 06:17 AM, Pavel Stehule wrote:
 2013/6/10 Hannu Krosing ha...@2ndquadrant.com:
 On 06/10/2013 09:45 PM, Pavel Stehule wrote:
 2013/6/10 David Fetter da...@fetter.org:
 On Mon, Jun 10, 2013 at 09:23:19PM +0200, Pavel Stehule wrote:
 2013/6/10 Hannu Krosing ha...@2ndquadrant.com:
 Hallo Everybody

 As far as I can see, currently you can not return
 anything out of a DO (anonymous code) block.

 Something like

 DO LANGUAGE plpythonu RETURNS TABLE (name text, uid int, gid int) $$
 with open('/etc/passwd') as f:
 fields = f.readline().split(':')
 while fields:
 name, uid, gid = fields[0], int(fields[2]),int(fields[3])
 yield name, uid, gid
 fields = f.readline().split(':')
 $$;

 As I did not pay attention when DO was introduced,
 I thought it is faster to ask here than read all possibly
 relevant mails in archives

 So: has there been a discussion on extending the DO
 construct with ability to rturn data out of it, similar
 to what named functions do.

 If there was then what were the arguments against doing this ?

 Or was this just that it was not thought important at that time ?
 I don't like this idea. I know so DO is +/- function, but it is too
 restrict. I hope so we will have a procedures with possibility unbound
 queries.

 and then you can do

 DO $$
   SELECT * FROM pg_class;
   SELECT * FROM pg_proc;
   ...
 $$ LANGUAGE SQL;

 and you don't need to define output structure - what is much more user 
 friendly.
 If I understand the proposal correctly, the idea is only to try to
 return something when DO is invoked with RETURNING.

 1.  Did I understand correctly, Hannu?
 2.  If I did, does this alleviate your concerns, Pavel?
 not too much. Two different concepts in one statement is not good
 idea.
 What two different concepts do you mean ?
 What using a cursors as temporary solution?

 BEGIN;
 DO $$
 BEGIN
  OPEN mycursor AS SELECT * FROM blablabla;
 END $$
 FETCH FROM mycursor;

 COMMIT;
 How would this work in an SQL query ?

 SELECT * FROM (FETCH FROM mycursor ) mc;
 we doesn't support it, but oracle, db2 allows

 SELECT * FROM TABLE(cursorname)



 ?
 Still I don't like this idea, because you should to support DO
 RETURNING in other statements - like INSERT INTO DO RETURNING ???
 Yes, I really would like DO to be full set returning construct
  similar to SELECT or I/U/D RETURNING.


 The syntax should be either RETURNS (as in function definition) or
 RETURNING as for I/U/D.

 I actually like the RETURNING better as it really does immediate return
  and not just defines a function returning something.

 What about local temporary functions ??

 CREATE TEMPORARY FUNCTION xx(a int)
 RETURNES TABLE (xxx)

 SELECT * FROM xxx;
 You mean that we define and use it in the same statement and after ';'
 ends the statement it disappears from scope ?

 This would probably still bloat pg_function table ?
 it is same hard issue like TEMPORARY TABLES

 Hannu, what is motivation for your proposal???
To be able to use pl/xxx languages on (almost) the same
way we use SQL now.

The current situation is akin to not being able to use queries
directly but always requiring you to create a view first and
then do select ... from myview

Think of DO ... RETURNING as inline function definition and
usage in one go.
 I have a two objections:

 * it is not too user friendly - you have to specify returns list every
 time, what is not comfort for very short life objects
If you can somehow avoid it for functions, then we could use the
same techniques for anonymous in-line functions as well.

And it is still more user friendly than creating a temporary
function for exactly the same purpose.
 * it is on way to introduce lot of NOT ANSI SQL extensions, that are
 not in other databases,
The whole pluggable language and CREATE FUNCTION is 

NOT ANSI SQL extensions, so why not make this easier to use.

 * it doesn't carry really new functionality
Yes, as pl/pgsql is (presumably :) ) Turing complete, no other
pl-s carry any new functionality .

What this does is making existing functionality more usable.

Again, if you have some good way to avoid specifying returns
list and deduce it from contents, i would be very interested.

This seems something relatively easy to do for SQL and
PL/PgSQL functions but much harder to achieve for
a pl language in general.

Almost the same functionality and syntax could be achieved
if we assume DO to always return SETOF RECORD and require
return definition on use like we do for functions returning SETOF RECORD

so instead of

hannu=# CREATE OR REPLACE FUNCTION testf(numeric) RETURNS SETOF RECORD as $$
return (('a','b','c'),('d','e','f'))
$$ LANGUAGE plpythonu;
CREATE FUNCTION
hannu=# SELECT * FROM testf(1) AS f(a text, b text, c text);
 a | b | c
---+---+---
 a | b | c
 d | e | f
(2 rows)

one could write directly

DO LANGUAGE plpythonu $$
return (('a','b','c'),('d','e','f'))
$$ AS f(a text, b text, c text);

and get the same result.


The reason

Re: [HACKERS] Parallell Optimizer

2013-06-11 Thread Simon Riggs

On 11 June 2013 01:45, Tatsuo Ishii is...@postgresql.org wrote:
 On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 7 June 2013 20:23, Tom Lane t...@sss.pgh.pa.us wrote:

  As for other databases, I suspect that ones that have parallel execution
  are probably doing it with a thread model not a process model.

 Separate processes are more common because it covers the general case
 where query execution is spread across multiple nodes. Threads don't
 work across nodes and parallel queries predate (working) threading
 models.

 Indeed. Parallelism based on processes would be more convenient for
 master-master
 type of applications. Even if no master-master feature is implemented
 directly in core,
  at least a parallelism infrastructure based on processes could be used for
 this purpose.

 As long as true synchronous replication is not implemented in core,
 I am not sure there's a value for parallel execution spreading across
 multile nodes because of the delay of data update propagation.

Please explain what you mean by the word true used here.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2013-06-11 Thread Markus Wanner

Ants,

the more I think about this, the more I start to like it.

On 06/07/2013 02:50 PM, Ants Aasma wrote:
 On Fri, Jun 7, 2013 at 2:59 PM, Markus Wanner mar...@bluegap.ch wrote:
 Agreed. Postgres-R uses a CommitOrderId, which is very similar in
 concept, for example.
 
 Do you think having this snapshot scheme would be helpful for Postgres-R?

Yeah, it could help to reduce patch size, after a rewrite to use such a CSN.

 Or why do you need to tell apart aborted from in-progress transactions
 by CSN?
 
 I need to detect aborted transactions so they can be discared during
 the eviction process, otherwise the sparse array will fill up. They
 could also be filtered out by cross-referencing uncommitted slots with
 the procarray. Having the abort case do some additional work to make
 xid assigment cheaper looks like a good tradeoff.

I see.

 Sparse buffer needs to be at least big enough to fit CSN slots for the
 xids of all active transactions and non-overflowed subtransactions. At
 the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out
 at 16 bytes * (64 + 1) slots * 100 =  backends = 101.6KB per buffer,
 or 203KB total in the default configuration.

 A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess
 the given 16 bytes includes alignment to 8 byte boundaries. Sounds good.
 
 8 byte alignment for CSNs is needed for atomic if not something else.

Oh, right, atomic writes.

 I think the size could be cut in half by using a base value for CSNs
 if we assume that no xid is active for longer than 2B transactions as
 is currently the case. I didn't want to include the complication in
 the first iteration, so I didn't verify if that would have any
 gotchas.

In Postgres-R, I effectively used a 32-bit order id which wraps around.

In this case, I guess adjusting the base value will get tricky. Wrapping
could probably be used as well, instead.

 The number of times each cache line can be invalidated is
 bounded by 8.

Hm.. good point.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] how to find out whether a view is updatable

2013-06-11 Thread Dean Rasheed

On 11 June 2013 01:03, Michael Paquier michael.paqu...@gmail.com wrote:
 Sorry for my late reply.

 On Sun, Jun 9, 2013 at 6:45 PM, Dean Rasheed dean.a.rash...@gmail.com
 wrote:

 I called it updatable rather than writable or read-only because it
 might perhaps be extended in the future with separate options for
 insertable and deletable. It could also be extended to give
 column-level control over updatability, or something like
 use_remote_updatability could be added, but that all feels like 9.4
 material.


 Yes this is definitely material for 9.4. You should add this patch to the
 1st commit fest. I'll add myself as a reviewer.
 Thanks,


Thanks. Arguably though, the API changes are something that should be
sorted out in 9.3, but I'm not sure how much of an appetite there is
for that, or whether it's too late.

pg_view_is_updatable() and pg_view_is_insertable() are both new to
9.3. They were designed purely to support the information schema
views, but are inadequate for most other practical purposes. Once 9.3
is out, we'll be stuck with them - although of course that doesn't
stop us adding more functions, I think it would be better to replace
them now.

Likewise the writable FDW API is new to 9.3, so I think 9.3 should at
least decide on the API for a FDW to specify whether a foreign table
is updatable.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallell Optimizer

On 06/10/2013 10:37 PM, FredDaniPandoraAquiles wrote:
 Hi,
  

  I asked a while ago in this group about the possibility to
 implement a
  parallel planner in a multithread way, and  the replies were
 that the
  proposed approach couldn't be implemented, because the postgres
 is not
  thread-safe. With the new feature Background Worker Processes, such
  implementation would be possible?


 Well, there are versions of genetic algorithms that use the concept of
 islands in which the populations evolve in parallel in the different
 islands and allows interaction between the islands and so on. I'm
 working in an algorithm based on multiagent systems. At the present
 moment, I mean in H2, the agents are threads, there are a few locks
 related to agents solutions, and a few locks for the best current
 solution in the environment where the agents are 'running'. The agents
 can exchange messages with a purpose. The environment is shared by the
 all agents and they use the environment to get informations from
 another agents (current solution for example), tries to update the
 best current solution and so on.
If you do this as an academic exercise, then I'd recommend thinking in
messages only.

Separate out the message delivery entirely from your core design.

This makes the whole concept much simpler and more generic.

Message delivery can be made almost instantaneous in case of threads
or to take a few tens of microseconds to several seconds
between different physical nodes

Which speed is fast enough depends entirely on your query - for a query
running 5 hours on single CPU and 5 minutes on a cluster, message
delay of 50 ms is entirely acceptable

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: [HACKERS] DO ... RETURNING

Hello

 The current situation is akin to not being able to use queries
 directly but always requiring you to create a view first and
 then do select ... from myview


ok

probably we can allow using DO in CTE without impact on other SQL
statements, and for this purpose we need to know returned
TupleDescriptor early.

so I am able accept it, although I am thinking so we are going in
strange direction. We are not able do simply tasks simply (we cannot
execute SQL script on server side simply) :(. But it is not problem of
Hannu design.

other question - can we find some readable and intuitive syntax for DO
parametrization?

Regards

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

2013-06-11 Thread David Fetter

On Tue, Jun 11, 2013 at 09:30:32AM +0200, Pavel Stehule wrote:
 Hello
 
  The current situation is akin to not being able to use queries
  directly but always requiring you to create a view first and
  then do select ... from myview
 
 
 ok
 
 probably we can allow using DO in CTE without impact on other SQL
 statements, and for this purpose we need to know returned
 TupleDescriptor early.
 
 so I am able accept it, although I am thinking so we are going in
 strange direction. We are not able do simply tasks simply (we cannot
 execute SQL script on server side simply) :(. But it is not problem of
 Hannu design.
 
 other question - can we find some readable and intuitive syntax for DO
 parametrization?

The thing I'm thinking of where this does something we don't yet do is
in the realm of access control, e.g. allowing people to use DO rather
than giving them DDL permission to create temporary functions.

Is this what you have in mind?  What other things?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Batch API for After Triggers

2013-06-11 Thread Simon Riggs

On 10 June 2013 22:50, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jun 8, 2013 at 5:00 PM, Simon Riggs si...@2ndquadrant.com wrote:
 While fiddling with FK tuning, Noah suggested batching trigger
 executions together to avoid execution overhead.

 It turns out there is no easy way to write triggers that can take
 advantage of the knowledge that they are being executed as a set of
 trigger executions. Some API is required to allow a trigger to
 understand that there may be other related trigger executions in the
 very near future, so it can attempt to amortise call overhead across
 many invocations (batching).

 The attached patch adds two fields to the TriggerDesc trigger
 functions are handed, allowing them to inspect (if they choose) the
 additional fields and thus potentially use some form of batching.

 I'm unclear how this could be used in practice.

As described, you can use it to prepare batches.

 Are the events in a
 batch guaranteed to, say, all be related to the same relation?

Good point. I was too focused on the single large statements I was
considering. There would need to some logic to look at relations as
well.

For individual statements it can work well, since all or at least the
majority of events are for a particular relation and we can know which
one that is.

It probably wouldn't work that well for deferred trigger events that
covered a spread of different relations.

Perhaps that's an argument for a final function after all.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

2013/6/11 David Fetter da...@fetter.org:
 On Tue, Jun 11, 2013 at 09:30:32AM +0200, Pavel Stehule wrote:
 Hello
 
  The current situation is akin to not being able to use queries
  directly but always requiring you to create a view first and
  then do select ... from myview
 

 ok

 probably we can allow using DO in CTE without impact on other SQL
 statements, and for this purpose we need to know returned
 TupleDescriptor early.

 so I am able accept it, although I am thinking so we are going in
 strange direction. We are not able do simply tasks simply (we cannot
 execute SQL script on server side simply) :(. But it is not problem of
 Hannu design.

 other question - can we find some readable and intuitive syntax for DO
 parametrization?

 The thing I'm thinking of where this does something we don't yet do is
 in the realm of access control, e.g. allowing people to use DO rather
 than giving them DDL permission to create temporary functions.

 Is this what you have in mind?  What other things?

no - I don't see any difference if we allow temporary functions or DO statement.

I am missing some simple tool for scripting any administration tasks -
cleaning schema, cleaning databases, user management, report
calculation.

For this a some procedural functionality is necessary. Now I use a
combination bash/psql/SQL and PL/pgSQL functions. It works, it works
well, but it is hard for leaning, hard for maintaining.

I had a idea - procedural enhancing of psql. When I work on prototype
I leave this idea. Probably we can implement some like C macros to
psql, but it is a end. So my second idea is a plpgsql server side
scripting. PL/pgSQL is readable and strong - just we need to eval
our procedural code outside a query. For these tasks a performance is
not top - because there hard bottleneck is IO. And slow combination
Bash/psql worked without performance problems. A server side
scripting can be consistent - probably all PL languages should be used
and enough for typical administrative tasks.

Regards

Pavel


 Cheers,
 David.
 --
 David Fetter da...@fetter.org http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter  XMPP: david.fet...@gmail.com
 iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] erroneous restore into pg_catalog schema

2013-06-11 Thread Dimitri Fontaine

Stephen Frost sfr...@snowman.net writes:
 What happens with the default settings when you try to install two
 extensions that have overlapping function signatures..?  I can't imagine
 it 'just works'..  And then what?  Is there a way that an admin can set
 up search paths for individual users which provide the 'right' function
 and work even when the user decides to change their search_path?

That entirely depends on how the extension script is written. Making it
possible to have two versions concurrently installed require a non
trivial amount of efforts, but I don't think the extension facility gets
in the way at all, currently.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] erroneous restore into pg_catalog schema

2013-06-11 Thread Andres Freund

On 2013-06-11 10:33:29 +0200, Dimitri Fontaine wrote:
 That entirely depends on how the extension script is written. Making it
 possible to have two versions concurrently installed require a non
 trivial amount of efforts, but I don't think the extension facility gets
 in the way at all, currently.

It does. We only allow an extension to be installed once, irregardless
of schema...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Andres Freund

On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote:
 It's legal, is it not, to just write the equivalent Unicode character in
 the JSON string and not use the escapes?  If so I would think that that
 would be the most common usage.  If someone's writing an escape, they
 probably had a reason for doing it that way, and might not appreciate
 our overriding their decision.

 We never store the converted values in the JSON object, nor do we return
 them from functions that return JSON. But many of the functions and
 operators that process the JSON have variants that return text instead of
 JSON, and in those cases, when the value returned is a JSON string, we do
 the following to it:
 

 I have just realized that the problem is actually quite a lot bigger than
 that. We also use this value for field name comparison. So, let us suppose
 that we have a LATIN1 database and a piece of JSON with a field name
 containing the Euro sign (\u20ac), a character that is not in LATIN1.
 Making that processable so it doesn't blow up would be mighty tricky and
 error prone. The non-orthogonality I suggested as a solution upthread is, by
 contrast, very small and easy to manage, and not terribly hard to explain -
 see attached.

I think this all shows pretty clearly that it was a mistake allowing
json data in the database that we cannot entirely display with the
database's encoding. All the proposed ugly workarounds are only
necessary because we don't throw an error when originally validating the
json.
Even in an utf-8 database you can get errors due to \u unescaping (at
attribute access time, *NOT* at json_in() time) due to invalidate
surrogate pairs.

I think this goes countrary to normal postgres approach of validating
data as strict as necessary. And I think we are going to regret not
fixing this while there are still relatively few users out there.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

2013-06-11 Thread Dimitri Fontaine

Hi,

That topic apparently raises each year and rehash the same points.

Pavel Stehule pavel.steh...@gmail.com writes:
 probably we can allow using DO in CTE without impact on other SQL
 statements, and for this purpose we need to know returned
 TupleDescriptor early.

I still think that DO being a utility statement, having it take
parameters and return data is going to be a wart in a part of the system
that has only too many of them already.

My thinking revolves around CTE support for functions:

  WITH FUNCTION name(param, ...)
RETURNS type
   LANGUAGE plpgsql AS (
$$ function body here $$
  )
  SELECT name(x, ...) FROM ...;

 so I am able accept it, although I am thinking so we are going in
 strange direction. We are not able do simply tasks simply (we cannot
 execute SQL script on server side simply) :(. But it is not problem of
 Hannu design.

With the DO utility command you can already execute SQL script on the
server quite simply. After all your proposals it's still unclear to me
where you want to process which data? (I admit this time I didn't pay
much attention, sorry about that)

 other question - can we find some readable and intuitive syntax for DO
 parametrization?

See above.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

2013/6/11 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Hi,

 That topic apparently raises each year and rehash the same points.

 Pavel Stehule pavel.steh...@gmail.com writes:
 probably we can allow using DO in CTE without impact on other SQL
 statements, and for this purpose we need to know returned
 TupleDescriptor early.

 I still think that DO being a utility statement, having it take
 parameters and return data is going to be a wart in a part of the system
 that has only too many of them already.

 My thinking revolves around CTE support for functions:

   WITH FUNCTION name(param, ...)
 RETURNS type
LANGUAGE plpgsql AS (
 $$ function body here $$
   )
   SELECT name(x, ...) FROM ...;

 so I am able accept it, although I am thinking so we are going in
 strange direction. We are not able do simply tasks simply (we cannot
 execute SQL script on server side simply) :(. But it is not problem of
 Hannu design.

 With the DO utility command you can already execute SQL script on the
 server quite simply. After all your proposals it's still unclear to me
 where you want to process which data? (I admit this time I didn't pay
 much attention, sorry about that)

there are a significant limit - you cannot simply change a database
when you collect statistics over databases, you cannot drop database
...

you cannot return multiple returns sets - show info about tables,
schemas, indexes in one call

what I would

DO
$$
BEGIN
  FOR r IN pg_databases
  LOOP
CONNECT r.dbname;
FOR table_name IN SELECT * FROM pg_class ... WHERE owner = 'GD'
AND table_name LIKE 'xx%'
LOOP
 IF pg_relsize(table_name)  xxx AND ... THEN
   -- show info about dropped table
 SELECT xx FROM pg_class, pg_attribute  --- SHOW STRUCTURE OF
ANY INTERESTING TABLE -- multirecordset output
  EXECUTE FORMAT('DROP TABLE %I', table_name);
  ...

Regards

Pavel



 other question - can we find some readable and intuitive syntax for DO
 parametrization?

 See above.

 Regards,
 --
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

2013-06-11 Thread Dimitri Fontaine

Pavel Stehule pavel.steh...@gmail.com writes:
   FOR r IN pg_databases
   LOOP
 CONNECT r.dbname;

Do you mean that you want to run this DO block on the client side?

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

On 06/11/2013 11:30 AM, Pavel Stehule wrote:
 2013/6/11 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Hi,

 That topic apparently raises each year and rehash the same points.

 Pavel Stehule pavel.steh...@gmail.com writes:
 probably we can allow using DO in CTE without impact on other SQL
 statements, and for this purpose we need to know returned
 TupleDescriptor early.
 I still think that DO being a utility statement, having it take
 parameters and return data is going to be a wart in a part of the system
 that has only too many of them already.

 My thinking revolves around CTE support for functions:

   WITH FUNCTION name(param, ...)
 RETURNS type
LANGUAGE plpgsql AS (
 $$ function body here $$
   )
   SELECT name(x, ...) FROM ...;
Yes, this would be another way to do in-line pl functions.

I do not think that DO ... RETURNING and WITH FUNCTION are mutually
exclusive.

Rather effort put into implementing one would also be useful for the other.

 so I am able accept it, although I am thinking so we are going in
 strange direction. We are not able do simply tasks simply (we cannot
 execute SQL script on server side simply) :(. But it is not problem of
 Hannu design.
 With the DO utility command you can already execute SQL script on the
 server quite simply. After all your proposals it's still unclear to me
 where you want to process which data? (I admit this time I didn't pay
 much attention, sorry about that)
 there are a significant limit - you cannot simply change a database
 when you collect statistics over databases, you cannot drop database
I can do this easily in pl/python.

Actually this is how statistics are collected in
https://github.com/postsql/pgmon_zabbix

You can also do it using pl/proxy or in plpgsql using dblink.

I have seen quite complicated data analysing utilities - with process
control running 1 to N backends depending on load -written in pl/pgsql
using dblink.

 ...

 you cannot return multiple returns sets 
Inability to return multiple result sets from a query is something which
was
introduced at some point after/during the move from PostQuel to SQL.

It is still there at server side at protocol level, but no client I know
of supports
it, and as far as I know it is also hard to generate on server



-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

On 06/11/2013 10:47 AM, Andres Freund wrote:
 On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote:
 It's legal, is it not, to just write the equivalent Unicode character in
 the JSON string and not use the escapes?  If so I would think that that
 would be the most common usage.  If someone's writing an escape, they
 probably had a reason for doing it that way, and might not appreciate
 our overriding their decision.
 We never store the converted values in the JSON object, nor do we return
 them from functions that return JSON. But many of the functions and
 operators that process the JSON have variants that return text instead of
 JSON, and in those cases, when the value returned is a JSON string, we do
 the following to it:

 I have just realized that the problem is actually quite a lot bigger than
 that. We also use this value for field name comparison. So, let us suppose
 that we have a LATIN1 database and a piece of JSON with a field name
 containing the Euro sign (\u20ac), a character that is not in LATIN1.
 Making that processable so it doesn't blow up would be mighty tricky and
 error prone. The non-orthogonality I suggested as a solution upthread is, by
 contrast, very small and easy to manage, and not terribly hard to explain -
 see attached.
 I think this all shows pretty clearly that it was a mistake allowing
 json data in the database that we cannot entirely display with the
 database's encoding. All the proposed ugly workarounds are only
 necessary because we don't throw an error when originally validating the
 json.
 Even in an utf-8 database you can get errors due to \u unescaping (at
 attribute access time, *NOT* at json_in() time) due to invalidate
 surrogate pairs.

 I think this goes countrary to normal postgres approach of validating
 data as strict as necessary. And I think we are going to regret not
 fixing this while there are still relatively few users out there.
Exactly -

 * allow in only valid JSON.
 * Validate all utf8 strings for valid unicode.
 * have one canonic way of outputting unicode - utf8 for utf8 databases,
escaped for all other encodings
 * If you need to store anything else, use text.

Requiring preserving original text in json data field is Not Good!

I fully expect '{a:1, a:none, a:true, a:b}'::json to come out
as '{a:b}'

(I know that currently this is noty true and will happen only once I
read in the json value in client)

For anything else - don't use json, use any text type

If you really need a simple text-validated-as-valid-json-input datatype
then add this.

Call it jsontext or somesuch :)

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

2013/6/11 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Pavel Stehule pavel.steh...@gmail.com writes:
   FOR r IN pg_databases
   LOOP
 CONNECT r.dbname;

 Do you mean that you want to run this DO block on the client side?

no, really no.

I am thinking about some outer server side process, where these
scripts will be executed. Maybe other usage for background worker
process

Pavel


 --
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Current CLOBBER_CACHE_ALWAYS failures

2013-06-11 Thread Christian Ullrich

The CLOBBER_CACHE_ALWAYS animals (friarbird and jaguarundi) have been 
failing persistently for about 36 hours now. The error is in a test 
added by Tom's recent commit a4424c5:


Expected:

-- Check row comparisons with IN
select * from int8_tbl i8 where i8 in (row(123,456));  -- fail, type 
mismatch
ERROR:  cannot compare dissimilar column types bigint and integer at 
record column 1


Actual:

select * from int8_tbl i8 where i8 in (row(123,456));  -- fail, type 
mismatch

connection to server was lost

Backtrace, as good as I can make it:

(gdb) bt
#0  0x00743fcf in cache_record_field_properties 
(typentry=0x8031c06f8)

at typcache.c:658
#1  0x00744053 in record_fields_have_compare (typentry=0x8031c06f8)
at typcache.c:625
#2  0x00743983 in lookup_type_cache (type_id=16446, flags=8)
at typcache.c:375
#3  0x0074287c in op_mergejoinable (opno=Variable opno is not 
available.

) at lsyscache.c:1188
#4  0x00605b0c in check_mergejoinable (restrictinfo=0x803064e18)
at initsplan.c:1876
#5  0x006062d8 in distribute_qual_to_rels (root=0x8031ce4e0,
clause=0x8031cea48, is_deduced=Variable is_deduced is not available.
) at initsplan.c:1364
#6  0x00606aa4 in deconstruct_recurse (root=0x8031ce4e0,
jtnode=0x803063d30, below_outer_join=0 '\0', qualscope=0x7fffbbb0,
inner_join_rels=0x7fffbba8) at initsplan.c:645
#7  0x00607429 in deconstruct_jointree (root=Variable root is 
not available.

) at initsplan.c:547
#8  0x00608554 in query_planner (root=0x8031ce4e0, 
tlist=0x8031ce470,

tuple_fraction=0, limit_tuples=-1,
qp_callback=0x60c570 standard_qp_callback, qp_extra=0x7fffbe50,
cheapest_path=0x7fffbe90, sorted_path=0x7fffbe88,
num_groups=0x7fffbea0) at planmain.c:176
#9  0x00609edc in grouping_planner (root=0x8031ce4e0, 
tuple_fraction=0)

at planner.c:1203
#10 0x0060c014 in subquery_planner (glob=0x803064f30,
parse=0x803054338, parent_root=0x803054450, hasRecursion=Variable 
hasRecursion is not available.

) at planner.c:558
#11 0x0060c318 in standard_planner (parse=0x803054338,
cursorOptions=0, boundParams=0x0) at planner.c:209
#12 0x0067ce0d in pg_plan_query (querytree=0x803054338,
cursorOptions=0, boundParams=0x0) at postgres.c:753
#13 0x0067ceee in pg_plan_queries (querytrees=Variable 
querytrees is not available.

) at postgres.c:812
#14 0x0067d1ca in exec_simple_query (
query_string=0x803053038 select * from int8_tbl i8 where i8 in 
(row(123,456));) at postgres.c:977
#15 0x0067e88a in PostgresMain (argc=1, argv=Variable argv is 
not available.

) at postgres.c:3985
#16 0x0063137c in ServerLoop () at postmaster.c:3987
#17 0x00634702 in PostmasterMain (argc=6, argv=0x7fffd728)
at postmaster.c:1246
#18 0x005ca98f in main (argc=6, argv=0x7fffd728) at main.c:196


(gdb) l
653 TCFLAGS_HAVE_FIELD_COMPARE);
654 for (i = 0; i  tupdesc-natts; i++)
655 {
656 TypeCacheEntry *fieldentry;
657
658 if (tupdesc-attrs[i]-attisdropped)
659 continue;
660
661 fieldentry = 
lookup_type_cache(tupdesc-attrs[i]-atttypid,
662 
   TYPECACHE_EQ_OPR |



(gdb) p i
$5 = 1
(gdb) p typentry
$6 = (TypeCacheEntry *) 0x8031c06f8
(gdb) p typentry-tupDesc
$7 = 0x0
(gdb) p *typentry
$8 = {type_id = 16446, typlen = -1, typbyval = 0 '\0', typalign = 100 'd',
  typstorage = 120 'x', typtype = 99 'c', typrelid = 16429, btree_opf = 
2994,

  btree_opintype = 2249, hash_opf = 0, hash_opintype = 0, eq_opr = 0,
  lt_opr = 0, gt_opr = 0, cmp_proc = 0, hash_proc = 0, eq_opr_finfo = {
fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0',
fn_retset = 0 '\0', fn_stats = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0,
fn_expr = 0x0}, cmp_proc_finfo = {fn_addr = 0, fn_oid = 0, fn_nargs 
= 0,
fn_strict = 0 '\0', fn_retset = 0 '\0', fn_stats = 0 '\0', fn_extra 
= 0x0,
fn_mcxt = 0x0, fn_expr = 0x0}, hash_proc_finfo = {fn_addr = 0, 
fn_oid = 0,
fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_stats = 0 
'\0',

fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, tupDesc = 0x0,
  rngelemtype = 0x0, rng_collation = 0, rng_cmp_proc_finfo = {fn_addr = 0,
fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0',
fn_stats = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
  rng_canonical_finfo = {fn_addr = 0, fn_oid = 0, fn_nargs = 0,
fn_strict = 0 '\0', fn_retset = 0 '\0', fn_stats = 0 '\0', fn_extra 
= 0x0,

fn_mcxt = 0x0, fn_expr = 0x0}, rng_subdiff_finfo = {fn_addr = 0,
fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0',
fn_stats = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
  flags = 0, enumData = 0x0}


type_id 16446

Re: [HACKERS] erroneous restore into pg_catalog schema

* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 Stephen Frost sfr...@snowman.net writes:
  What happens with the default settings when you try to install two
  extensions that have overlapping function signatures..?  I can't imagine
  it 'just works'..  And then what?  Is there a way that an admin can set
  up search paths for individual users which provide the 'right' function
  and work even when the user decides to change their search_path?
 
 That entirely depends on how the extension script is written. Making it
 possible to have two versions concurrently installed require a non
 trivial amount of efforts, but I don't think the extension facility gets
 in the way at all, currently.

How would you recommend writing an extension script which deals with
conflicts?

Also, as Andres points out, the current extension system doesn't allow
installing multiple versions.  It'd be kind of nice if it did, but
there's problems in that direction.  Extension authors can manage that
issue by having differently named extensions (where the name includes
some number); similar to libraries.  That isn't the only case where name
conflicts can and will occur between extensions though, which is the
more general issue that I was pointing out.

If there's no knowledge between the extension authors of the other
extension (which is likey the case..) then chances are that such a
conflict will cause either a failure or incorrect behavior.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Server side lo-funcs name

2013-06-11 Thread Robert Haas

On Tue, Jun 11, 2013 at 1:32 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Recently we got a complain about server side large object function
 names described in the doc:
 http://www.postgresql.org/message-id/51b2413f.8010...@gmail.com

 In the doc:
 http://www.postgresql.org/docs/9.3/static/lo-funcs.html

 There are server-side functions callable from SQL that correspond to
 each of the client-side functions described above; indeed, for the
 most part the client-side functions are simply interfaces to the
 equivalent server-side functions

From the description it is hard for users to find out server side
 functions loread and lowrite becuase they are looking for
 lo_read and lo_write. So I think his complain is fair. Included
 patches attempt to fix the problem.

 I have committed this.

Did you see my email with proposed alternative text?  You didn't even
fix the whitespace error I pointed out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs



On 06/11/2013 06:53 AM, Hannu Krosing wrote:

On 06/11/2013 10:47 AM, Andres Freund wrote:

On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote:

It's legal, is it not, to just write the equivalent Unicode character in
the JSON string and not use the escapes?  If so I would think that that
would be the most common usage.  If someone's writing an escape, they
probably had a reason for doing it that way, and might not appreciate
our overriding their decision.

We never store the converted values in the JSON object, nor do we return
them from functions that return JSON. But many of the functions and
operators that process the JSON have variants that return text instead of
JSON, and in those cases, when the value returned is a JSON string, we do
the following to it:

I have just realized that the problem is actually quite a lot bigger than
that. We also use this value for field name comparison. So, let us suppose
that we have a LATIN1 database and a piece of JSON with a field name
containing the Euro sign (\u20ac), a character that is not in LATIN1.
Making that processable so it doesn't blow up would be mighty tricky and
error prone. The non-orthogonality I suggested as a solution upthread is, by
contrast, very small and easy to manage, and not terribly hard to explain -
see attached.

I think this all shows pretty clearly that it was a mistake allowing
json data in the database that we cannot entirely display with the
database's encoding. All the proposed ugly workarounds are only
necessary because we don't throw an error when originally validating the
json.
Even in an utf-8 database you can get errors due to \u unescaping (at
attribute access time, *NOT* at json_in() time) due to invalidate
surrogate pairs.

I think this goes countrary to normal postgres approach of validating
data as strict as necessary. And I think we are going to regret not
fixing this while there are still relatively few users out there.

Exactly -

  * allow in only valid JSON.
  * Validate all utf8 strings for valid unicode.
  * have one canonic way of outputting unicode - utf8 for utf8 databases,
escaped for all other encodings
  * If you need to store anything else, use text.

Requiring preserving original text in json data field is Not Good!

I fully expect '{a:1, a:none, a:true, a:b}'::json to come out
as '{a:b}'

(I know that currently this is noty true and will happen only once I
read in the json value in client)

For anything else - don't use json, use any text type

If you really need a simple text-validated-as-valid-json-input datatype
then add this.

Call it jsontext or somesuch :)




Most of this is irrelevant to solving the current problem. We don't have 
a greenfields development of JSON - we have an existing type with which 
we have to work. Note too that your utf8 for utf8 databases, escaped 
for all other encodings is exactly the opposite of what Noah and Andres 
and Tom are arguing, namely that the database encoding should not affect 
the behaviour, and is very close to what I proposed yesterday.


It's a pity that we don't have a non-error producing conversion function 
(or if we do that I haven't found it). Then we might adopt a rule for 
processing unicode escapes that said convert unicode escapes to the 
database encoding if possible, and if not then emit them unchanged. 
which might be a reasonable compromise.


cheers

andrew







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs


On 2013-06-11 12:53 CEST, Hannu Krosing wrote:

On 06/11/2013 10:47 AM, Andres Freund wrote:

On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote:

It's legal, is it not, to just write the equivalent Unicode character in
the JSON string and not use the escapes?  If so I would think that that
would be the most common usage.  If someone's writing an escape, they
probably had a reason for doing it that way, and might not appreciate
our overriding their decision.

We never store the converted values in the JSON object, nor do we return
them from functions that return JSON. But many of the functions and
operators that process the JSON have variants that return text instead of
JSON, and in those cases, when the value returned is a JSON string, we do
the following to it:

I have just realized that the problem is actually quite a lot bigger than
that. We also use this value for field name comparison. So, let us suppose
that we have a LATIN1 database and a piece of JSON with a field name
containing the Euro sign (\u20ac), a character that is not in LATIN1.
Making that processable so it doesn't blow up would be mighty tricky and
error prone. The non-orthogonality I suggested as a solution upthread is, by
contrast, very small and easy to manage, and not terribly hard to explain -
see attached.

I think this all shows pretty clearly that it was a mistake allowing
json data in the database that we cannot entirely display with the
database's encoding. All the proposed ugly workarounds are only
necessary because we don't throw an error when originally validating the
json.
Even in an utf-8 database you can get errors due to \u unescaping (at
attribute access time, *NOT* at json_in() time) due to invalidate
surrogate pairs.

I think this goes countrary to normal postgres approach of validating
data as strict as necessary. And I think we are going to regret not
fixing this while there are still relatively few users out there.

Exactly -

  * allow in only valid JSON.
  * Validate all utf8 strings for valid unicode.
  * have one canonic way of outputting unicode - utf8 for utf8 databases,
escaped for all other encodings
  * If you need to store anything else, use text.

Requiring preserving original text in json data field is Not Good!

I fully expect '{a:1, a:none, a:true, a:b}'::json to come out
as '{a:b}'


ahem, do you mean instead to give (none - null and missing '' inserted 
in answer):

=# SELECT '{a:1, a:null, a:true, a:b}'::json;
 json
--
 {a:b}

or only when stored in database and subsequently retrieved? The 
original text in this case was perfectly valid JSON text.



(I know that currently this is noty true and will happen only once I
read in the json value in client)


Isn't this a good situation and doesn't this also depend on the storage 
representation in the client?


What about this:
=# SELECT '{measure:seconds, measure:42}'::json;
 json
--
 {measure:42}

I presume people being used to store metadata in preceding json object 
members with duplicate names, would want to decide in the client 
requesting the data what to do with the metadata information and at what 
point to drop, wouldn't they :-?)



For anything else - don't use json, use any text type

If you really need a simple text-validated-as-valid-json-input datatype
then add this.

Call it jsontext or somesuch :)



All the best,
Stefan.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

On 06/11/2013 02:41 PM, Andrew Dunstan wrote:

 On 06/11/2013 06:53 AM, Hannu Krosing wrote:
 On 06/11/2013 10:47 AM, Andres Freund wrote:
 On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote:
 It's legal, is it not, to just write the equivalent Unicode
 character in
 the JSON string and not use the escapes?  If so I would think that
 that
 would be the most common usage.  If someone's writing an escape, they
 probably had a reason for doing it that way, and might not appreciate
 our overriding their decision.
 We never store the converted values in the JSON object, nor do we
 return
 them from functions that return JSON. But many of the functions and
 operators that process the JSON have variants that return text
 instead of
 JSON, and in those cases, when the value returned is a JSON string,
 we do
 the following to it:

 I have just realized that the problem is actually quite a lot
 bigger than
 that. We also use this value for field name comparison. So, let us
 suppose
 that we have a LATIN1 database and a piece of JSON with a field name
 containing the Euro sign (\u20ac), a character that is not in
 LATIN1.
 Making that processable so it doesn't blow up would be mighty
 tricky and
 error prone. The non-orthogonality I suggested as a solution
 upthread is, by
 contrast, very small and easy to manage, and not terribly hard to
 explain -
 see attached.
 I think this all shows pretty clearly that it was a mistake allowing
 json data in the database that we cannot entirely display with the
 database's encoding. All the proposed ugly workarounds are only
 necessary because we don't throw an error when originally validating
 the
 json.
 Even in an utf-8 database you can get errors due to \u unescaping (at
 attribute access time, *NOT* at json_in() time) due to invalidate
 surrogate pairs.

 I think this goes countrary to normal postgres approach of validating
 data as strict as necessary. And I think we are going to regret not
 fixing this while there are still relatively few users out there.
 Exactly -

   * allow in only valid JSON.
   * Validate all utf8 strings for valid unicode.
   * have one canonic way of outputting unicode - utf8 for utf8
 databases,
 escaped for all other encodings
   * If you need to store anything else, use text.

 Requiring preserving original text in json data field is Not Good!

 I fully expect '{a:1, a:none, a:true, a:b}'::json to come out
 as '{a:b}'

 (I know that currently this is noty true and will happen only once I
 read in the json value in client)

 For anything else - don't use json, use any text type

 If you really need a simple text-validated-as-valid-json-input datatype
 then add this.

 Call it jsontext or somesuch :)



 Most of this is irrelevant to solving the current problem. We don't
 have a
 greenfields development of JSON - we have an existing type with which
 we have to work.
We do have current (postgresql v9.2) json data type, which is loosely
defined as
a string which can be fed to a JSON deserializer without errors.

But then we also have a standard.

Sadly the standard is really not about a type, but about serialising a
subset of JavaScript
structures to a utf8 string.

This kind of implies an underlying type which is a structure consisting of
a few basic types and lists and dictionaries/classes based on these but
does not really
specify any type.

 Note too that your utf8 for utf8 databases, escaped for all other
 encodings
 is exactly the opposite of what Noah and Andres and Tom are arguing,
 namely that the database encoding should not affect the behaviour,
It does not affect behaviour. It makes it possible to encode valid JSON
(utf8) in client
encoding which may not have full set of unicode characters.

We can do it here, as we know that by definition JSON _is_ unicode, so
clients have
to be able to process any unicode charcter. We can not do this for plain
text, which
has no defined charset outside what client_encoding postulates.
 and is very close to what I proposed yesterday.

 It's a pity that we don't have a non-error producing conversion function
 (or if we do that I haven't found it). Then we might adopt a rule for
 processing
 unicode escapes that said convert unicode escapes to the database
 encoding
only when extracting JSON keys or values to text makes it sense to unescape
to database encoding.

strings inside JSON itself are by definition utf8
 if possible, and if not then emit them unchanged. which might be a
 reasonable
 compromise.
I'd opt for ... and if not then emit them quoted. The default should
be not loosing
any data.



-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

On 06/11/2013 03:08 PM, Stefan Drees wrote:
 quiring preserving original text in json data field is Not Good!

 I fully expect '{a:1, a:none, a:true, a:b}'::json to come out
 as '{a:b}'

 ahem, do you mean instead to give (none - null and missing ''
 inserted in answer):
yeah, mixed up none and null
 =# SELECT '{a:1, a:null, a:true, a:b}'::json;
  json
 --
  {a:b}

 or only when stored in database and subsequently retrieved? The
 original text in this case was perfectly valid JSON text.

 (I know that currently this is noty true and will happen only once I
 read in the json value in client)

 Isn't this a good situation and doesn't this also depend on the
 storage representation in the client?

 What about this:
 =# SELECT '{measure:seconds, measure:42}'::json;
  json
 --
  {measure:42}

 I presume people being used to store metadata in preceding json
 object members with duplicate names, would want to decide in the
 client requesting the data what to do with the metadata information
 and at what point to drop, wouldn't they :-?)
Seems like blatant misuse of JSON format :)

I assume that as JSON is _serialisation_ format, it should represent a
data structure, not processing instructions.

I can see no possible JavaScript structure which could produce duplicate
key when serialised.

And I don't think that any standard JSON reader supports this either.

Of you want to store any JavaScript snippets in database use text.

Or perhaps pl/v8 :)

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs



On 06/11/2013 09:16 AM, Hannu Krosing wrote:




It's a pity that we don't have a non-error producing conversion function
(or if we do that I haven't found it). Then we might adopt a rule for
processing
unicode escapes that said convert unicode escapes to the database
encoding

only when extracting JSON keys or values to text makes it sense to unescape
to database encoding.


That's exactly the scenario we are talking about. When emitting JSON the 
functions have always emitted unicode escapes as they are in the text, 
and will continue to do so.




strings inside JSON itself are by definition utf8



We have deliberately extended that to allow JSON strings to be in any 
database server encoding. That was argued back in the 9.2 timeframe and 
I am not interested in re-litigating it.


The only issue at hand is how to handle unicode escapes (which in their 
string form are pure ASCII) when emitting text strings.



if possible, and if not then emit them unchanged. which might be a
reasonable
compromise.

I'd opt for ... and if not then emit them quoted. The default should
be not loosing
any data.






I don't know what this means at all. Quoted how? Let's say I have a 
Latin1 database and have the following JSON string: \u20AC2.00. In a 
UTF8 database the text representation of this is €2.00 - what are you 
saying it should be in the Latin1 database?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] how to find out whether a view is updatable

Dean Rasheed dean.a.rash...@gmail.com writes:
 On 11 June 2013 01:03, Michael Paquier michael.paqu...@gmail.com wrote:
 Yes this is definitely material for 9.4. You should add this patch to the

 Thanks. Arguably though, the API changes are something that should be
 sorted out in 9.3,

I agree --- I'm planning to look at this in the next few days.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs



On 06/11/2013 09:23 AM, Hannu Krosing wrote:



I can see no possible JavaScript structure which could produce duplicate
key when serialised.

And I don't think that any standard JSON reader supports this either.


You are quite wrong. This was discussed quite recently on -hackers, too. 
V8 will happily accept a JSON string with duplicate keys via eval() and 
resolve it in favor of the lexically latest value. I gather most other 
JSON processors do likewise.



Can we PLEASE PLEASE stop sending this discussion off track and 
concentrate on the actual problem we have at hand? It's BETA and there 
is not much time. I get that you don't like how we have implemented 
JSON. But we're not going back over that ground now. It's done and in 
use and LOTS of people are finding it very useful.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

On 06/11/2013 03:42 PM, Andrew Dunstan wrote:

 On 06/11/2013 09:16 AM, Hannu Krosing wrote:



 It's a pity that we don't have a non-error producing conversion
 function
 (or if we do that I haven't found it). Then we might adopt a rule for
 processing
 unicode escapes that said convert unicode escapes to the database
 encoding
 only when extracting JSON keys or values to text makes it sense to
 unescape
 to database encoding.

 That's exactly the scenario we are talking about. When emitting JSON
 the functions have always emitted unicode escapes as they are in the
 text, and will continue to do so.


 strings inside JSON itself are by definition utf8


 We have deliberately extended that to allow JSON strings to be in any
 database server encoding. 
Ugh!

Does that imply that we just not allow it, but rather require it ?

Why are we arguing the unicode surrogate pairs as a JSON thing then ?
 
Should it not be client to server encoding conversion thing instead ?

 That was argued back in the 9.2 timeframe and I am not interested in
 re-litigating it.

 The only issue at hand is how to handle unicode escapes (which in
 their string form are pure ASCII) when emitting text strings.
Unicode escapes in non-unicode strings seem something that is
ill-defined by nature ;)

That is, you can't come up with a good general answer for this.
 if possible, and if not then emit them unchanged. which might be a
 reasonable
 compromise.
 I'd opt for ... and if not then emit them quoted. The default should
 be not loosing
 any data.





 I don't know what this means at all. Quoted how? Let's say I have a
 Latin1 database and have the following JSON string: \u20AC2.00. In a
 UTF8 database the text representation of this is €2.00 - what are you
 saying it should be in the Latin1 database?

utf8-quote the '€' - \u20AC2.00

That is, convert unicode--Latin1 what has a correspondence, utf8-quote
anything that does not.

If we allow unicode escapes in non-unicode strings anyway, then this
seems the most logical thing to do.



 cheers

 andrew




-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs


On 2013-06-11 15:23 CEST, Hannu Krosing wrote:

On 06/11/2013 03:08 PM, Stefan Drees wrote:

...

What about this:
=# SELECT '{measure:seconds, measure:42}'::json;
  json
--
  {measure:42}

I presume people being used to store metadata in preceding json
object members with duplicate names, would want to decide in the
client requesting the data what to do with the metadata information
and at what point to drop, wouldn't they :-?)

Seems like blatant misuse of JSON format :)

I assume that as JSON is _serialisation_ format, it should represent a
data structure, not processing instructions.

I can see no possible JavaScript structure which could produce duplicate
key when serialised.


ahem, JSON is a notation that allows toplevel an object or an array.
If it is an object, this consists of pairs called (name, value).
Here value can be any object, array, number, string or the literals 
null, false or true.
The name must be a string. That's it :-) no key **and** also no ordering 
on these names ;-) and as the RFC does not care, where the data came 
from or how it was represented before it became JSON text (the 
top-level element of a JSON document) how should the parser know
 ... but delta notaion, commenting, or streaming needs created many 
applications that deliver multibags and trust on some ordering 
conventions in their dataexchanging relations.



And I don't think that any standard JSON reader supports this either.


Oh yes. Convention is merely: Keep all (Streaming) or the last 
(whatever the last may mean, must be carefully ensured in the 
interchange relation).
All would like these two scenarios, but the RFC as is does not prevent 
an early-out (like INSERT OR IGNORE) :-))



Of you want to store any JavaScript snippets in database use text.


JSON is language agnostic. I use more JSON from python, php than from 
js, but others do so differently ...



Or perhaps pl/v8 :)



Do you mean the V8 Engine Javascript Procedural Language add-on for 
PostgreSQL (http://code.google.com/p/plv8js/), I guess so.


I did not want to hijack the thread, as this centered more around 
escaping where and what in which context (DB vs. client encoding).


As the freshly created IETF json working group revamps the JSON RFC on 
its way to the standards track, there are currently also discussions on 
what to do with unicode surrogate pairs. See eg. this thread 
http://www.ietf.org/mail-archive/web/json/current/msg00675.html starting 
a summarizing effort.


Just in case it helps making the fresh JSON feature of PostgreSQL 
bright, shining and future proof :-)


Stefan.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Server side lo-funcs name

 On Tue, Jun 11, 2013 at 1:32 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Recently we got a complain about server side large object function
 names described in the doc:
 http://www.postgresql.org/message-id/51b2413f.8010...@gmail.com

 In the doc:
 http://www.postgresql.org/docs/9.3/static/lo-funcs.html

 There are server-side functions callable from SQL that correspond to
 each of the client-side functions described above; indeed, for the
 most part the client-side functions are simply interfaces to the
 equivalent server-side functions

From the description it is hard for users to find out server side
 functions loread and lowrite becuase they are looking for
 lo_read and lo_write. So I think his complain is fair. Included
 patches attempt to fix the problem.

 I have committed this.
 
 Did you see my email with proposed alternative text?  You didn't even
 fix the whitespace error I pointed out.

I don't know why but I haven't received your email. I just found the
email in the web archive and it looks better than what I proposed. Do
you want to commit it yourself?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

On 06/11/2013 03:54 PM, Andrew Dunstan wrote:

 On 06/11/2013 09:23 AM, Hannu Krosing wrote:


 I can see no possible JavaScript structure which could produce duplicate
 key when serialised.

 And I don't think that any standard JSON reader supports this either.

 You are quite wrong. This was discussed quite recently on -hackers, too.
 V8 will happily accept a JSON string with duplicate keys via eval()
 and resolve it in favor of the lexically latest value.
This is what I mean.

It is a valid _input_ value , but no existing JavaScript structure
serializes to it.

In other words - I want us to have in minds some underlying structure,
not the text

hannu=# select '1e0'::float;
 float8

  1
(1 row)

we are not preserving '1e0'  in floats, why should we preserve it in
json () ?

We do imply an internal structured format in several functions operating
on json even though we store it in text .



 I gather most other JSON processors do likewise.


 Can we PLEASE PLEASE stop sending this discussion off track and
 concentrate on the actual problem we have at hand? It's BETA and there
 is not much time. 

 I get that you don't like how we have implemented JSON. 
The current implementation is a reasonably good compromise, so I can't
say I don't like it :)

I am here going from the premise that at some point we might implement a
json-like binary structured type.

If it is left separate from json, I am ok with all kinds of quirks
coming from current vague definition of what a json type is.

OTOH, if the idea is to move json storage format to this binary
structured type, we should resolve
possible incompatibilities as early as possible and start thinking of
the current as text storage in terms
of something that is a real structured type, just stored in text format.

This could also mean converting to canonical format on input.

 But we're not going back over that ground now. It's done and in use
 and LOTS of people are finding it very useful.

I am one of these people who finds it very useful.

I just want to avoid painting us in a corner too early.

I would like us to have *one* json type in the future, not separate
json input string and json compatible binary structure or somesuch



 cheers

 andrew




-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

On 06/11/2013 04:04 PM, Stefan Drees wrote:
 On 2013-06-11 15:23 CEST, Hannu Krosing wrote:
 On 06/11/2013 03:08 PM, Stefan Drees wrote:
 ...

 What about this:
 =# SELECT '{measure:seconds, measure:42}'::json;
   json
 --
   {measure:42}

 I presume people being used to store metadata in preceding json
 object members with duplicate names, would want to decide in the
 client requesting the data what to do with the metadata information
 and at what point to drop, wouldn't they :-?)
 Seems like blatant misuse of JSON format :)

 I assume that as JSON is _serialisation_ format, it should represent a
 data structure, not processing instructions.

 I can see no possible JavaScript structure which could produce duplicate
 key when serialised.

 ahem, JSON is a notation that allows toplevel an object or an array.
 If it is an object, this consists of pairs called (name, value).
 Here value can be any object, array, number, string or the literals
 null, false or true.
 The name must be a string. That's it :-) no key **and** also no
 ordering on these names ;-) and as the RFC does not care, where the
 data came from or how it was represented before it became JSON text
 (the top-level element of a JSON document) how should the parser know
  ... but delta notaion, commenting, or streaming needs created many
 applications that deliver multibags and trust on some ordering
 conventions in their dataexchanging relations.

 And I don't think that any standard JSON reader supports this either.

 Oh yes. Convention is merely: Keep all (Streaming) or the last
 (whatever the last may mean, must be carefully ensured in the
 interchange relation).
 All would like these two scenarios, but the RFC as is does not prevent
 an early-out (like INSERT OR IGNORE) :-))
I was kind of assuming that JSON is a (JavaScript) Object Serialization
Notation, that is,
there is a unique implied JavaScript Object which can be Serialized
int any given JSON string.

IOW, that if you serialise an object then this is what JSON should be.

The fact that most JSON to Object readers support multiple keys is
just an implementation artifact and not something that is required by RFC.

 Of you want to store any JavaScript snippets in database use text.

 JSON is language agnostic. I use more JSON from python, php than from
 js, but others do so differently ...
Agreed.

Even the fact that you can define any operations on a JSON string -
like extracting a value for key - is actually non-standard :)

Perhaps I should stop thinking of json type as something that implies
any underlying structure ...

 Or perhaps pl/v8 :)


 Do you mean the V8 Engine Javascript Procedural Language add-on for
 PostgreSQL (http://code.google.com/p/plv8js/), I guess so.

 I did not want to hijack the thread, as this centered more around
 escaping where and what in which context (DB vs. client encoding).

 As the freshly created IETF json working group revamps the JSON RFC on
 its way to the standards track, there are currently also discussions
 on what to do with unicode surrogate pairs. See eg. this thread
 http://www.ietf.org/mail-archive/web/json/current/msg00675.html
 starting a summarizing effort.
Wow. The rabbit hole is much deeper than I thought :)

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

On Tue, Jun 11, 2013 at 6:07 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2013/6/11 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Pavel Stehule pavel.steh...@gmail.com writes:
   FOR r IN pg_databases
   LOOP
 CONNECT r.dbname;

 Do you mean that you want to run this DO block on the client side?

 no, really no.

 I am thinking about some outer server side process, where these
 scripts will be executed. Maybe other usage for background worker
 process

+ 1

I agree with all your comments pretty much down the line.  Need top
level CALL that supports parameterization and multiple sets that
utilizes background worker (we have example spi worker that gives some
hints about how pl/pgsql could be made to work).  Because it's top
level (can't even be inlined to CTE), we can access behaviors that are
not possible in current pl/pgsql, for example setting transaction
isolation in advance of snapshot and changing database connection
mid-procedure.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

* Merlin Moncure (mmonc...@gmail.com) wrote:
 I agree with all your comments pretty much down the line.  Need top
 level CALL that supports parameterization and multiple sets that
 utilizes background worker (we have example spi worker that gives some
 hints about how pl/pgsql could be made to work).  Because it's top
 level (can't even be inlined to CTE), we can access behaviors that are
 not possible in current pl/pgsql, for example setting transaction
 isolation in advance of snapshot and changing database connection
 mid-procedure.

And this still has next-to-nothing to do with the specific proposal that
was put forward.

I'd like actual procedures too, but it's a completely different and
distinct thing from making DO blocks able to return something.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallell Optimizer

 On 11 June 2013 01:45, Tatsuo Ishii is...@postgresql.org wrote:
 On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 7 June 2013 20:23, Tom Lane t...@sss.pgh.pa.us wrote:

  As for other databases, I suspect that ones that have parallel execution
  are probably doing it with a thread model not a process model.

 Separate processes are more common because it covers the general case
 where query execution is spread across multiple nodes. Threads don't
 work across nodes and parallel queries predate (working) threading
 models.

 Indeed. Parallelism based on processes would be more convenient for
 master-master
 type of applications. Even if no master-master feature is implemented
 directly in core,
  at least a parallelism infrastructure based on processes could be used for
 this purpose.

 As long as true synchronous replication is not implemented in core,
 I am not sure there's a value for parallel execution spreading across
 multile nodes because of the delay of data update propagation.
 
 Please explain what you mean by the word true used here.

In another word, eager replication.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallell Optimizer

 On Tue, Jun 11, 2013 at 9:45 AM, Tatsuo Ishii is...@postgresql.org wrote:
 
  On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
 
  On 7 June 2013 20:23, Tom Lane t...@sss.pgh.pa.us wrote:
 
   As for other databases, I suspect that ones that have parallel
 execution
   are probably doing it with a thread model not a process model.
 
  Separate processes are more common because it covers the general case
  where query execution is spread across multiple nodes. Threads don't
  work across nodes and parallel queries predate (working) threading
  models.
 
  Indeed. Parallelism based on processes would be more convenient for
  master-master
  type of applications. Even if no master-master feature is implemented
  directly in core,
   at least a parallelism infrastructure based on processes could be used
 for
  this purpose.

 As long as true synchronous replication is not implemented in core,
 I am not sure there's a value for parallel execution spreading across
 multile nodes because of the delay of data update propagation.

 True, but we cannot drop the possibility to have such features in the future
 either, so a process-based model is safer regarding the possible range of
 features and applications we could gain with.

I wonder why true synchronous replication nor eager replication
are not in the developer TODO list. If we want them in the future,
they should be on it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

2013/6/11 Stephen Frost sfr...@snowman.net:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
 I agree with all your comments pretty much down the line.  Need top
 level CALL that supports parameterization and multiple sets that
 utilizes background worker (we have example spi worker that gives some
 hints about how pl/pgsql could be made to work).  Because it's top
 level (can't even be inlined to CTE), we can access behaviors that are
 not possible in current pl/pgsql, for example setting transaction
 isolation in advance of snapshot and changing database connection
 mid-procedure.

 And this still has next-to-nothing to do with the specific proposal that
 was put forward.

 I'd like actual procedures too, but it's a completely different and
 distinct thing from making DO blocks able to return something.

I think so it is related - we talk about future form of DO statement -
or about future form of server side scripting.

But it is not important in this moment

Pavel


 Thanks,

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

2013/6/11 Pavel Stehule pavel.steh...@gmail.com:
 2013/6/11 Stephen Frost sfr...@snowman.net:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
 I agree with all your comments pretty much down the line.  Need top
 level CALL that supports parameterization and multiple sets that
 utilizes background worker (we have example spi worker that gives some
 hints about how pl/pgsql could be made to work).  Because it's top
 level (can't even be inlined to CTE), we can access behaviors that are
 not possible in current pl/pgsql, for example setting transaction
 isolation in advance of snapshot and changing database connection
 mid-procedure.

 And this still has next-to-nothing to do with the specific proposal that
 was put forward.

 I'd like actual procedures too, but it's a completely different and
 distinct thing from making DO blocks able to return something.

 I think so it is related - we talk about future form of DO statement -
 or about future form of server side scripting.

 But it is not important in this moment

I wrote, so I can live with Hannu proposal.

Regards



 Pavel


 Thanks,

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

On Tue, Jun 11, 2013 at 9:45 AM, Stephen Frost sfr...@snowman.net wrote:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
 I agree with all your comments pretty much down the line.  Need top
 level CALL that supports parameterization and multiple sets that
 utilizes background worker (we have example spi worker that gives some
 hints about how pl/pgsql could be made to work).  Because it's top
 level (can't even be inlined to CTE), we can access behaviors that are
 not possible in current pl/pgsql, for example setting transaction
 isolation in advance of snapshot and changing database connection
 mid-procedure.

 And this still has next-to-nothing to do with the specific proposal that
 was put forward.


It's a complete feature but completely relevant to the discussion --
the behaviors have a lot of overlap and CALL is in the standard
whereas the ad hoc feature DO isn't.  This comes up in other feature
requests too, like psql bash-like scripting features.

That said, it would be pretty cool if you could inline DO into a CTE
or more generally into a query (is that possible?) -- then you'd have
something distinct.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-06-11 Thread Jon Nelson

There hasn't been much activity here recently.  I'm curious, then, if
there are questions that I can answer.
It may be useful to summarize some things here:

- the purpose of the patch is to use posix_fallocate when creating new
WAL files, because it's (usually) much quicker
- using posix_fallocate is *one* system call versus 2048 calls to write(2)
- additionally, using posix_fallocate /guarantees/ that the filesystem
has space for the WAL file (by spec)
- reportedly (difficult to test or prove), using posix_fallocate *may*
reduce file fragmentation
- the (limited) testing I've done bears this out: the more new WAL
file creation there is, the more the improvement. Once the number of
WAL files reaches a constant point, there does not appear to be either
a positive or a negative performance impact. This is as expected.
- a test program (C) was also written and used which creates,
allocates, and then writes to files as fast as possible. This test
program also shows the expected performance benefits.
- the performance benefits range from a few percent up to about 15 percent

Concerns:
- some were concerned that the spec makes no claims about
posix_fallocate being able to guarantee that the space allocated has
zeroes in it. This was discussed here and on the Linux Kernel mailing
list, wherein the expected behavior is that it does provide zeroes
- most systems don't allocate a great many new WAL files, so the
performance benefit is small
- your concern here

Benefits:
- new WAL file allocate is much quicker, more efficient (fewer system calls)
- the patch is (reportedly - I'm not a good judge here!) quite small


--
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-11 Thread Greg Smith


On 6/10/13 6:02 PM, Fabien COELHO wrote:

  - the tps is global, with a mutex to share the global stochastic process
  - there is an adaptation for the fork emulation
  - I do not know wheter this works with Win32 pthread stuff.


Instead of this complexity, can we just split the TPS input per client? 
 That's all I was thinking of here, not adding a new set of threading 
issues.  If 1 TPS is requested and there's 10 clients, just set the 
delay so that each of them targets 1000 TPS.


I'm guessing it's more accurate to have them all communicate as you've 
done here, but it seems like a whole class of new bugs and potential 
bottlenecks could come out of that.  Whenever someone touches the 
threading model for pgbench it usually gives a stack of build farm 
headaches.  Better to avoid those unless there's really a compelling 
reason to go through that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
 2013/6/11 Stephen Frost sfr...@snowman.net:
  And this still has next-to-nothing to do with the specific proposal that
  was put forward.
 
  I'd like actual procedures too, but it's a completely different and
  distinct thing from making DO blocks able to return something.
 
 I think so it is related - we talk about future form of DO statement -
 or about future form of server side scripting.

I don't believe there's any intent to ever have DO used for stored
procedures.  Not only are stored procedures deserving of their own
top-level command (eg: CALL, as has been discussed before..), but I
believe they would necessairly have different enough semantics that
shoe-horning them into DO would end up breaking backwards compatibility.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)


On 2013-11.06 17:28, Jon Nelson wrote:

There hasn't been much activity here recently.  I'm curious, then, if
there are questions that I can answer.
It may be useful to summarize some things here:

- the purpose of the patch is to use posix_fallocate when creating new
WAL files, because it's (usually) much quicker
- using posix_fallocate is *one* system call versus 2048 calls to write(2)
- additionally, using posix_fallocate /guarantees/ that the filesystem
has space for the WAL file (by spec)
- reportedly (difficult to test or prove), using posix_fallocate *may*
reduce file fragmentation
- the (limited) testing I've done bears this out: the more new WAL
file creation there is, the more the improvement. Once the number of
WAL files reaches a constant point, there does not appear to be either
a positive or a negative performance impact. This is as expected.
- a test program (C) was also written and used which creates,
allocates, and then writes to files as fast as possible. This test
program also shows the expected performance benefits.
- the performance benefits range from a few percent up to about 15 percent


tried the test program of the patch at least a bit.

Retrieved it from:
http://www.postgresql.org/message-id/attachment/29088/test_fallocate.c

on an oldish linux box (Kernel 2.6.32, x86_64) following
$ gcc -o test_fallocate test_fallocate.c
a short
$ ./test_fallocate foo 1 1
yields:
without posix_fallocate: 1 open/close iterations, 1 rewrite in 26.1993s
with posix_fallocate: 1 open/close iterations, 1 rewrite in 13.3299s

on another box (Kernel 3.2.0, x86_64) same procedure yields:
without posix_fallocate: 1 open/close iterations, 1 rewrite in 19.1972s
with posix_fallocate: 1 open/close iterations, 1 rewrite in 9.9280s

Note, when trying gcc -O2 test_fallocate.c fails to compile with:

In file included from /usr/include/fcntl.h:252:0,
 from test_fallocate.c:3:
In function ‘open’,
inlined from ‘main’ at test_fallocate.c:68:16:
/usr/include/x86_64-linux-gnu/bits/fcntl2.h:51:24: error: call to 
‘__open_missing_mode’ declared with attribute error: open with O_CREAT 
in second argument needs 3 arguments




Concerns:
- some were concerned that the spec makes no claims about
posix_fallocate being able to guarantee that the space allocated has
zeroes in it. This was discussed here and on the Linux Kernel mailing
list, wherein the expected behavior is that it does provide zeroes
- most systems don't allocate a great many new WAL files, so the
performance benefit is small
- your concern here

Benefits:
- new WAL file allocate is much quicker, more efficient (fewer system calls)
- the patch is (reportedly - I'm not a good judge here!) quite small


HTH,
Stefan.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

On Tue, Jun 11, 2013 at 11:08 AM, Stefan Drees ste...@drees.name wrote:
 On 2013-11.06 17:28, Jon Nelson wrote:

 There hasn't been much activity here recently.  I'm curious, then, if
 there are questions that I can answer.
 It may be useful to summarize some things here:

 - the purpose of the patch is to use posix_fallocate when creating new
 WAL files, because it's (usually) much quicker
 - using posix_fallocate is *one* system call versus 2048 calls to write(2)
 - additionally, using posix_fallocate /guarantees/ that the filesystem
 has space for the WAL file (by spec)
 - reportedly (difficult to test or prove), using posix_fallocate *may*
 reduce file fragmentation
 - the (limited) testing I've done bears this out: the more new WAL
 file creation there is, the more the improvement. Once the number of
 WAL files reaches a constant point, there does not appear to be either
 a positive or a negative performance impact. This is as expected.
 - a test program (C) was also written and used which creates,
 allocates, and then writes to files as fast as possible. This test
 program also shows the expected performance benefits.
 - the performance benefits range from a few percent up to about 15 percent


 tried the test program of the patch at least a bit.

 Retrieved it from:
 http://www.postgresql.org/message-id/attachment/29088/test_fallocate.c

 on an oldish linux box (Kernel 2.6.32, x86_64) following
 $ gcc -o test_fallocate test_fallocate.c
 a short
 $ ./test_fallocate foo 1 1
 yields:
 without posix_fallocate: 1 open/close iterations, 1 rewrite in 26.1993s
 with posix_fallocate: 1 open/close iterations, 1 rewrite in 13.3299s

 on another box (Kernel 3.2.0, x86_64) same procedure yields:
 without posix_fallocate: 1 open/close iterations, 1 rewrite in 19.1972s
 with posix_fallocate: 1 open/close iterations, 1 rewrite in 9.9280s

 Note, when trying gcc -O2 test_fallocate.c fails to compile with:

 In file included from /usr/include/fcntl.h:252:0,
  from test_fallocate.c:3:
 In function ‘open’,
 inlined from ‘main’ at test_fallocate.c:68:16:
 /usr/include/x86_64-linux-gnu/bits/fcntl2.h:51:24: error: call to
 ‘__open_missing_mode’ declared with attribute error: open with O_CREAT in
 second argument needs 3 arguments

It's understood that posix_fallocate is faster at this -- the question
on the table is 'does this matter in context of postgres?'.
Personally I think this patch should go in regardless -- the concerns
made IMNSHO are specious.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

2013/6/11 Stephen Frost sfr...@snowman.net:
 * Pavel Stehule (pavel.steh...@gmail.com) wrote:
 2013/6/11 Stephen Frost sfr...@snowman.net:
  And this still has next-to-nothing to do with the specific proposal that
  was put forward.
 
  I'd like actual procedures too, but it's a completely different and
  distinct thing from making DO blocks able to return something.

 I think so it is related - we talk about future form of DO statement -
 or about future form of server side scripting.

 I don't believe there's any intent to ever have DO used for stored
 procedures.  Not only are stored procedures deserving of their own
 top-level command (eg: CALL, as has been discussed before..), but I
 believe they would necessairly have different enough semantics that
 shoe-horning them into DO would end up breaking backwards compatibility.

In this moment, DO doesn't support any feature that is in conflict
with stored procedure functionality, because it is based on functions,
and then it have to have limited functionality

Syntax of procedures and functions is relatively well defined

CREATE FUNCTION foo(..)  SELECT expression contains foo call

CREATE PROCEDURE foo(..) --- CALL foo()

Now anonymous code block is based on functions, but it can be changed
to respect context or usage without lost of compatibility.

DO $$ ... $$ -- procedural behave -- just execute server side scripts

CTE DO RETURNING $$ ... $$ -- functional behave, functional limits.

Pavel



 Thanks,

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

On Tue, Jun 11, 2013 at 11:00 AM, Stephen Frost sfr...@snowman.net wrote:
 * Pavel Stehule (pavel.steh...@gmail.com) wrote:
 2013/6/11 Stephen Frost sfr...@snowman.net:
  And this still has next-to-nothing to do with the specific proposal that
  was put forward.
 
  I'd like actual procedures too, but it's a completely different and
  distinct thing from making DO blocks able to return something.

 I think so it is related - we talk about future form of DO statement -
 or about future form of server side scripting.

 I don't believe there's any intent to ever have DO used for stored
 procedures.  Not only are stored procedures deserving of their own
 top-level command (eg: CALL, as has been discussed before..), but I
 believe they would necessairly have different enough semantics that
 shoe-horning them into DO would end up breaking backwards compatibility.

I was not arguing to shoe-horn them into DO, but rather that the
proposal is shoe-horning into DO what should be in CALL (but I'm
having second thoughts about that -- CALL AFAIK can't do in-line code
blocks).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

On Tue, Jun 11, 2013 at 11:26 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2013/6/11 Stephen Frost sfr...@snowman.net:
 * Pavel Stehule (pavel.steh...@gmail.com) wrote:
 2013/6/11 Stephen Frost sfr...@snowman.net:
  And this still has next-to-nothing to do with the specific proposal that
  was put forward.
 
  I'd like actual procedures too, but it's a completely different and
  distinct thing from making DO blocks able to return something.

 I think so it is related - we talk about future form of DO statement -
 or about future form of server side scripting.

 I don't believe there's any intent to ever have DO used for stored
 procedures.  Not only are stored procedures deserving of their own
 top-level command (eg: CALL, as has been discussed before..), but I
 believe they would necessairly have different enough semantics that
 shoe-horning them into DO would end up breaking backwards compatibility.

 In this moment, DO doesn't support any feature that is in conflict
 with stored procedure functionality, because it is based on functions,
 and then it have to have limited functionality

 Syntax of procedures and functions is relatively well defined

 CREATE FUNCTION foo(..)  SELECT expression contains foo call

 CREATE PROCEDURE foo(..) --- CALL foo()

 Now anonymous code block is based on functions, but it can be changed
 to respect context or usage without lost of compatibility.

 DO $$ ... $$ -- procedural behave -- just execute server side scripts

 CTE DO RETURNING $$ ... $$ -- functional behave, functional limits.

why does it have to be CTE?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

On Tuesday, June 11, 2013, Pavel Stehule wrote:

  I don't believe there's any intent to ever have DO used for stored
  procedures.  Not only are stored procedures deserving of their own
  top-level command (eg: CALL, as has been discussed before..), but I
  believe they would necessairly have different enough semantics that
  shoe-horning them into DO would end up breaking backwards compatibility.

 In this moment, DO doesn't support any feature that is in conflict
 with stored procedure functionality, because it is based on functions,
 and then it have to have limited functionality


Let me re-state: I don't see DO, which is entirely function oriented today,
ever being redefined as suddenly being stored procedures instead. Claiming
that it wouldn't impact existing users of DO is a level of hand-waving that
I just can't follow. If nothing else, it would certainly impact external
language support.

If we're going to continue to hand-wave at this then I would argue that
we'd be able to magically make DO .. RETURNING also happily work in stored
procedure mode when the time comes without impacting users.

I'm done with this thread. For my part- we can and should extend DO to
support RETURNING. DO RETURNING support is a big enough and useful enough
addition that it can go in by itself.  Attempting to feature-creep it to
also cover stored procedures will simply end up killing it unfairly and
will not move us forward at all.

We should also have stored procedures.  Until there's an actual patch for
stored procedures which has some real conflict with DO RETURNING, I don't
see it as being relevant.

I look forward to patches for both.

Thanks,

Stephen

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

* Merlin Moncure (mmonc...@gmail.com) wrote:
 It's understood that posix_fallocate is faster at this -- the question
 on the table is 'does this matter in context of postgres?'.
 Personally I think this patch should go in regardless -- the concerns
 made IMNSHO are specious.

I've not had a chance to look at this patch, but I tend to agree with
Merlin.  My main question is really- would this be useful for extending
*relations*?  Apologies if it's already been discussed; I do plan to go
back and read the threads about this more fully, but I wanted to voice
my support for using posix_fallocate, when available, in general.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] DO ... RETURNING

On 06/11/2013 05:27 PM, Merlin Moncure wrote:
 On Tue, Jun 11, 2013 at 9:45 AM, Stephen Frost sfr...@snowman.net wrote:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
 I agree with all your comments pretty much down the line.  Need top
 level CALL that supports parameterization and multiple sets that
 utilizes background worker (we have example spi worker that gives some
 hints about how pl/pgsql could be made to work).  Because it's top
 level (can't even be inlined to CTE), we can access behaviors that are
 not possible in current pl/pgsql, for example setting transaction
 isolation in advance of snapshot and changing database connection
 mid-procedure.
 And this still has next-to-nothing to do with the specific proposal that
 was put forward.

 It's a complete feature but completely relevant to the discussion --
 the behaviors have a lot of overlap and CALL is in the standard
 whereas the ad hoc feature DO isn't. 
Could you point to the ISO/ANSI SQL CALL definition ?
 This comes up in other feature requests too, like psql bash-like scripting 
 features.
Client side scripting is distinct from kind-of-server-side multi
database scripting.

As I said before, it is doable now using pl/pgsql and dblink / plproxy
and is trivial in pl/python.
 That said, it would be pretty cool if you could inline DO into a CTE
 or more generally into a query (is that possible?) 
Currently not possible, especially because DO does not return
anything at this moment.

Again, you can of course run a DO query from any pl (excelp maybe SQL
and pl/pgsql)

hannu=# DO language plpythonu $$
plpy.execute(DO LANGUAGE plpgsql $x$ BEGIN RAISE NOTICE 'hi!'; END;$x$)
$$;
NOTICE:  hi!
CONTEXT:  SQL statement DO LANGUAGE plpgsql $x$ BEGIN RAISE NOTICE
'hi!'; END;$x$
PL/Python anonymous code block
DO

 -- then you'd have something distinct.
I like the CTE approach better as general inline in query approach and
DO ... RETURNING as a way for ad-hoc procs or direct remoting from client.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO ... RETURNING

On Tue, Jun 11, 2013 at 12:01 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 Could you point to the ISO/ANSI SQL CALL definition ?

I can't: no one can because the SQL standard is not available online.
But you can look at various proxies, for example here:
http://farrago.sourceforge.net/design/UserDefinedTypesAndRoutines.html
or here:
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r897.htm

 As I said before, it is doable now using pl/pgsql and dblink / plproxy
 and is trivial in pl/python.
 That said, it would be pretty cool if you could inline DO into a CTE
 or more generally into a query (is that possible?)

 Currently not possible, especially because DO does not return
 anything at this moment.

right, well, I meant hypothetically; I was following along with your
thinking and extending inline code blocks to be able to be used in any
place where a function is allowed.  Specifically I was wondering if
there are technical limitations that keep them limited to CTE usage
(as is the case with data modifying WITH).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] how to find out whether a view is updatable

2013-06-11 Thread Michael Paquier

On Tue, Jun 11, 2013 at 4:07 PM, Dean Rasheed dean.a.rash...@gmail.comwrote:

 Thanks. Arguably though, the API changes are something that should be
 sorted out in 9.3, but I'm not sure how much of an appetite there is
 for that, or whether it's too late.

I see, OK for the API changes on the functions, but I am not sure it is
time to add new options in postgres_fdw as you do in your second patch.
Unfortunately I will not be able to have a look in details at your patch, I
am sick...
-- 
Michael

Re: [HACKERS] DO ... RETURNING


On 2013-06-11 19:01 CEST, Hannu Krosing wrote:

On 06/11/2013 05:27 PM, Merlin Moncure wrote:

On Tue, Jun 11, 2013 at 9:45 AM, Stephen Frost ... wrote:

* Merlin Moncure ... wrote:

I agree with all your comments pretty much down the line.  Need top
level CALL that supports parameterization and multiple sets that
utilizes background worker (we have example spi worker that gives some
hints about how pl/pgsql could be made to work).  Because it's top
level (can't even be inlined to CTE), we can access behaviors that are
not possible in current pl/pgsql, for example setting transaction
isolation in advance of snapshot and changing database connection
mid-procedure.

And this still has next-to-nothing to do with the specific proposal that
was put forward.


It's a complete feature but completely relevant to the discussion --
the behaviors have a lot of overlap and CALL is in the standard
whereas the ad hoc feature DO isn't.

Could you point to the ISO/ANSI SQL CALL definition ?


On the publicly available standards page of ISO:

http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html

searching in it for SQL shows link of ISO/IEC 9075-1:2008:

http://standards.iso.org/ittf/PubliclyAvailableStandards/c045498_ISO_IEC_9075-1_2008.zip

when you follow and go through:

http://standards.iso.org/ittf/licence.html

i.e. accept the granted license, you may receive a personal single copy 
non-distributable PDF version of


ISO/IEC 9075-1:2008, 3rd Edition, Information technology -- Database 
languages -- SQL -- Part 1: Framework (SQL/Framework), of COmmittee 
JTC1/SC32


There at least in section 5.3.4 you find the pointer, that among others, 
the terms CALL and RETURN are specified in ISO9075-2.


So that points, but to follow to the end ... ;-)

but at least it is clear from this source, that CALL seems to be a 
statement in SQL to invoke a procedure or whatever name juggling suits.



...


HTH,
Stefan.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-06-11 Thread Greg Smith


On 6/11/13 12:22 PM, Merlin Moncure wrote:


Personally I think this patch should go in regardless -- the concerns
made IMNSHO are specious.


That's nice, but we have this process for validating whether features go 
in or not that relies on review instead of opinions.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)


On 2013-06-11 19:45 CEST, Greg Smith wrote:

On 6/11/13 12:22 PM, Merlin Moncure wrote:


Personally I think this patch should go in regardless -- the concerns
made IMNSHO are specious.


That's nice, but we have this process for validating whether features go
in or not that relies on review instead of opinions.

;-) that's why I played with the test_fallocate.c, as it was easy to do 
and I understood, the author (of the patch) wanted to trigger some 
reviews ... I do not (yet) know anything about the core codes, so I 
leave this to the hackers. My review result was, that with newer gcc's 
you should specify an open mode flag as third argument of the fopen 
call, as only with the test tool nothing important found.


Stefan.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-06-11 Thread Jon Nelson

On Tue, Jun 11, 2013 at 12:49 PM, Stefan Drees ste...@drees.name wrote:
 On 2013-06-11 19:45 CEST, Greg Smith wrote:

 On 6/11/13 12:22 PM, Merlin Moncure wrote:

 Personally I think this patch should go in regardless -- the concerns
 made IMNSHO are specious.


 That's nice, but we have this process for validating whether features go
 in or not that relies on review instead of opinions.

 ;-) that's why I played with the test_fallocate.c, as it was easy to do and
 I understood, the author (of the patch) wanted to trigger some reviews ... I
 do not (yet) know anything about the core codes, so I leave this to the
 hackers. My review result was, that with newer gcc's you should specify an
 open mode flag as third argument of the fopen call, as only with the test
 tool nothing important found.

If you change line 68 to read:

fd = open(filename, O_CREAT | O_EXCL | O_WRONLY, 0600);

then it should work just fine (assuming you have not already done so).


--
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SPGist triple parity concept doesn't work

2013-06-11 Thread Teodor Sigaev




Anyway I now think that we might be better off with the other idea of
abandoning an insertion and retrying if we get a lock conflict.  That
would at least not create any performance penalty for non-concurrent
scenarios; and even in concurrent cases, I suspect you'd have to be
rather unlucky to get penalties as bad as the heavyweight-lock solution
is showing.


Agree, it would be a better workaround for now. I will be able to do this at 
this friday.


I considered the idea to forbid placement of child on the same page as parent, 
but this implementation a) could significantly increase size of index, b) 
doesn't solve Greg's point.


We definetly need new idea of locking protocol and I'll return to this problem 
at autumn (sorry, I havn't time in summer to do this research).

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs



On 06/10/2013 11:22 PM, Noah Misch wrote:

On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote:

On 06/10/2013 10:18 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

After thinking about this some more I have come to the conclusion that
we should only do any de-escaping of \u sequences, whether or not
they are for BMP characters, when the server encoding is utf8. For any
other encoding, which is already a violation of the JSON standard
anyway, and should be avoided if you're dealing with JSON, we should
just pass them through even in text output. This will be a simple and
very localized fix.

Hmm.  I'm not sure that users will like this definition --- it will seem
pretty arbitrary to them that conversion of \u sequences happens in some
databases and not others.

Yep.  Suppose you have a LATIN1 database.  Changing it to a UTF8 database
where everyone uses client_encoding = LATIN1 should not change the semantics
of successful SQL statements.  Some statements that fail with one database
encoding will succeed in the other, but a user should not witness a changed
non-error result.  (Except functions like decode() that explicitly expose byte
representations.)  Having SELECT '[\u00e4]'::json - 0 emit 'ä' in the
UTF8 database and '\u00e4' in the LATIN1 database would move PostgreSQL in the
wrong direction relative to that ideal.


Then what should we do when there is no matching codepoint in the
database encoding? First we'll have to delay the evaluation so it's not
done over-eagerly, and then we'll have to try the conversion and throw
an error if it doesn't work. The second part is what's happening now,
but the delayed evaluation is not.

+1 for doing it that way.





As a final counter example, let me note that Postgres itself handles 
Unicode escapes differently in UTF8 databases - in other databases it 
only accepts Unicode escapes up to U+007f, i.e. ASCII characters.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.4 CF1 Starts Saturday: need patches, reviewers, asst. CFM

2013-06-11 Thread Josh Berkus

Hackers,

As a reminder, the first 9.4 commitfest starts this Saturday.  If you
plan to submit patches for CF1, please make sure that they are mailed in
and registered with the Commitfest app by Friday, June 14th.  Patches
added after midnight will be bounced to CF2.  We already have 72 patches
pending.

If you are available to do CF reviews, and want to be assigned a patch
instead of choosing one for yourself, then please join the
pgsql-rrreviewers mailing list:
http://www.postgresql.org/community/lists/subscribe/  (yes, that is 3 r's).

Also, I'm still looking for an assistant Commitfest Manager to help
with managing patches, authors and reviewers.  Thanks!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 CF1 Starts Saturday: need patches, reviewers, asst. CFM

Josh,

* Josh Berkus (j...@agliodbs.com) wrote:
 We already have 72 patches pending.

Fun!

 If you are available to do CF reviews, and want to be assigned a patch
 instead of choosing one for yourself, then please join the
 pgsql-rrreviewers mailing list:
 http://www.postgresql.org/community/lists/subscribe/  (yes, that is 3 r's).

Already on it and am planning to support with reviews and commits. :)

Have we picked a date when we're going to branch..?  As I recall, it was
around this time last year, and clearly needs to be before we start
commiting things from 9.4 CF1.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] 9.4 CF1 Starts Saturday: need patches, reviewers, asst. CFM

2013-06-11 Thread Atri Sharma

On Tue, Jun 11, 2013 at 11:46 PM, Stephen Frost sfr...@snowman.net wrote:
 Josh,

 * Josh Berkus (j...@agliodbs.com) wrote:
 We already have 72 patches pending.

 Fun!

 If you are available to do CF reviews, and want to be assigned a patch
 instead of choosing one for yourself, then please join the
 pgsql-rrreviewers mailing list:
 http://www.postgresql.org/community/lists/subscribe/  (yes, that is 3 r's).

 Already on it and am planning to support with reviews and commits. :)

 Have we picked a date when we're going to branch..?  As I recall, it was
 around this time last year, and clearly needs to be before we start
 commiting things from 9.4 CF1.

 Thanks,

 Stephen


Yay!
--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

On Tue, Jun 11, 2013 at 12:45 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 6/11/13 12:22 PM, Merlin Moncure wrote:

 Personally I think this patch should go in regardless -- the concerns
 made IMNSHO are specious.

 That's nice, but we have this process for validating whether features go in
 or not that relies on review instead of opinions.

Sure: I phrased that badly by 'go in' I meant 'go through the review
process', not commit out of hand.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

Andrew Dunstan and...@dunslane.net writes:
 As a final counter example, let me note that Postgres itself handles 
 Unicode escapes differently in UTF8 databases - in other databases it 
 only accepts Unicode escapes up to U+007f, i.e. ASCII characters.

Good point.  What if we adopt that same definition for JSON, and get rid
of the need to do explicit encoding conversion at all in the JSON code?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: Should array_length() Return NULL)

2013-06-11 Thread Josh Berkus

On 04/08/2013 07:16 PM, Brendan Jurd wrote:
 On 9 April 2013 09:24, Josh Berkus j...@agliodbs.com wrote:
 As much as I have a keen interest in this feature, it isn't (AFAIK)
 being considered for 9.3.  Given that it's generated a fair amount of
 controversy, could we table it until 9.3 beta?  There's still plenty of
 unresolved 9.3 patches in the queue.
 
 No problem.  I certainly wasn't expecting it to run for 90 messages
 when I started out.  I'll pipe down for now and resume after the beta.

What's the status on this patch and current approach to ZDA?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-11 Thread Liming Hu

On Sat, May 18, 2013 at 6:38 PM, David Fetter da...@fetter.org wrote:
 On Fri, May 17, 2013 at 05:50:38PM +0200, Cédric Villemain wrote:
 Hello Liming,

   Sounds interesting. How can we build this over our current
   implementation, or do we need to build it from scratch?
  
   I know how to write the code, but just need approval of accepting into
   the new version.
 
  Well, acceptance depends largely on the implementation and actual
  benefit statistics. I would suggest implementing a basic version and
  then demonstrating its potential benefits here. It will lead to
  clearer ideas for us and lead to improvements in the implementation.

 You can have a look at this page:
 http://wiki.postgresql.org/wiki/Submitting_a_Patch

 Also very important to look at for patch implementation and
 communication:

 http://wiki.postgresql.org/wiki/Working_with_Git

 Cheers,
 David (wondering whether we should provide specific instructions for
 github, bitbucket, etc.).

I have implemented the code according to Joe's suggestion, and put the code at:
https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1

Thanks,

Liming
 --
 David Fetter da...@fetter.org http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter  XMPP: david.fet...@gmail.com
 iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate



--
Liming Hu
cell: (435)-512-4190
Seattle Washington


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallell Optimizer

2013-06-11 Thread Jim Nasby


On 6/7/13 2:23 PM, Tom Lane wrote:

As for other databases, I suspect that ones that have parallel execution
are probably doing it with a thread model not a process model.


Oracle 9i was multi-process, not multi-threaded. IIRC it actually had dedicated 
IO processes too; backends didn't do their own IO.

We certainly need to protect the use case of queries that run in milliseconds, 
and clearly parallelism won't help there at all. But we can't ignore the other 
end of the spectrum; you'd need a LOT of communication overhead to swamp the 
benefits of parallel execution on a multi-minute, CPU-bound query (or in many 
cases even IO bound).
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-11 Thread Fabien COELHO




  - the tps is global, with a mutex to share the global stochastic process
  - there is an adaptation for the fork emulation
  - I do not know wheter this works with Win32 pthread stuff.


Instead of this complexity,


Well, the mutex impact is very localized in the code. The complexity is 
more linked to the three thread implementations intermixed there.



can we just split the TPS input per client?


Obviously it is possible. Note that it is more logical to do that per 
thread. I did one shared stochastic process because it makes more sense to 
have just one.


That's all I was thinking of here, not adding a new set of threading issues. 
If 1 TPS is requested and there's 10 clients, just set the delay so that 
each of them targets 1000 TPS.


Ok, so I understand that a mutex is too much!

I'm guessing it's more accurate to have them all communicate as you've done 
here, but it seems like a whole class of new bugs and potential bottlenecks 
could come out of that.


I do not think that there is a performance or locking contension issue: it 
is about getting a mutex for a section which performs one integer add and 
two integer assignements, that is about 3 instructions, to be compared 
with the task of performing database operations over the network. There 
are several orders of magnitudes between those tasks. It would need a more 
than terrible mutex implementation to have any significant impact.


Whenever someone touches the threading model for pgbench it usually 
gives a stack of build farm headaches.  Better to avoid those unless 
there's really a compelling reason to go through that.


I agree that the threading model in pgbench is a mess, mostly because of 
the 3 concurrent implementations intermixed in the code. Getting rid of 
the fork emulation and win32 special handling and only keeping the pthread 
implementation, which seems to be available even on windows, would be a 
relief. I'm not sure if there is still a rationale to have these 3 
implementations, but it ensures a maintenance mess:-(


I'll submit a version without mutex, but ISTM that this one is 
conceptually cleaner, although I'm not sure about what happens on windows.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallell Optimizer

2013-06-11 Thread Gavin Flower


On 11/06/13 19:24, Hannu Krosing wrote:

On 06/10/2013 10:37 PM, FredDaniPandoraAquiles wrote:

Hi,

 I asked a while ago in this group about the possibility to
implement a
 parallel planner in a multithread way, and  the replies were
that the
 proposed approach couldn't be implemented, because the
postgres is not
 thread-safe. With the new feature Background Worker Processes,
such
 implementation would be possible?


Well, there are versions of genetic algorithms that use the concept 
of islands in which the populations evolve in parallel in the 
different islands and allows interaction between the islands and so 
on. I'm working in an algorithm based on multiagent systems. At the 
present moment, I mean in H2, the agents are threads, there are a few 
locks related to agents solutions, and a few locks for the best 
current solution in the environment where the agents are 'running'. 
The agents can exchange messages with a purpose. The environment is 
shared by the all agents and they use the environment to get 
informations from another agents (current solution for example), 
tries to update the best current solution and so on.
If you do this as an academic exercise, then I'd recommend thinking in 
messages only.


Separate out the message delivery entirely from your core design.

This makes the whole concept much simpler and more generic.

Message delivery can be made almost instantaneous in case of threads
or to take a few tens of microseconds to several seconds
between different physical nodes

Which speed is fast enough depends entirely on your query - for a query
running 5 hours on single CPU and 5 minutes on a cluster, message
delay of 50 ms is entirely acceptable
--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ
I suspect (from my position of almost total ignorance of this area!) 
that once a generic method works independently of how closely coupled 
the different parallel parts are, then a later optimisation could be  
added dependent on how the parts were related. So running on a multi 
core chip could have a different communication system to that running 
across multiple computer geographically dispersed. Thogh in practice, I 
suspect that bthe most common use case would involve many processor 
chips in the same 'box' (even if said box was distributed across a large 
room!).


Anyhow, I think that separating out how to effectively parallelise 
Postgres from how the parts communicate is a Good Thing (TM). Though 
knowing Grim Reality, it is bound to b e more complicated in Reality! 
:-(  As the useful size of work of the parallel units obviously does 
relate to the communication overhead.


Possibly the biggest challenge will be in devising a planning 
methodology that can efficiently decide on an appropriate parallel 
strategy. Maybe a key word to tell the planner that you know this is a 
very big query and you don't mind it taking a long to come up with a 
decent plan?  The planner would need to know details of the processing 
unit topology, communication overheads, and possibly other details - to 
make a really effective plan in the distributed case.


My mind boggles, just thinking of the number of different variables that 
might be required to create an 'optimal' plan for parallel processing in 
a distributed system!



Cheers,
Gavin

Re: [HACKERS] Parallell Optimizer

On 06/11/2013 04:53 PM, Tatsuo Ishii wrote:
 On 11 June 2013 01:45, Tatsuo Ishii is...@postgresql.org wrote:
 On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 7 June 2013 20:23, Tom Lane t...@sss.pgh.pa.us wrote:

 As for other databases, I suspect that ones that have parallel execution
 are probably doing it with a thread model not a process model.
 Separate processes are more common because it covers the general case
 where query execution is spread across multiple nodes. Threads don't
 work across nodes and parallel queries predate (working) threading
 models.

 Indeed. Parallelism based on processes would be more convenient for
 master-master
 type of applications. Even if no master-master feature is implemented
 directly in core,
  at least a parallelism infrastructure based on processes could be used for
 this purpose.
 As long as true synchronous replication is not implemented in core,
 I am not sure there's a value for parallel execution spreading across
 multile nodes because of the delay of data update propagation.
 Please explain what you mean by the word true used here.
 In another word, eager replication.
Do you mean something along these lines :

Most synchronous or eager replication solutions do conflict prevention,
while asynchronous solutions have to do conflict resolution. For instance,
if a record is changed on two nodes simultaneously, an eager replication
system would detect the conflict before confirming the commit and abort
one of the transactions. A lazy replication system would allow both
transactions to commit and run a conflict resolution during
resynchronization. 

?

IMO it is possible to do this easily once BDR has reached the state
where you
can do streaming apply. That is, you replay actions on other hosts as they
are logged, not after the transaction commits. Doing it this way you can
wait
any action to successfully complete a full circle before committing it
in source.

Currently main missing part in doing this is autonomous transactions.
It can in theory be done by opening an extra backend for each incoming
transaction but you will need really big number of backends and also you
have extra overhead from interprocess communications.



-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-11 Thread Fabien COELHO



Submission 10:

 - per thread throttling instead of a global process with a mutex.
   this avoids a mutex, and the process is shared between clients
   of a given thread.

 - ISTM that there thread start time should be initialized at the
   beginning of threadRun instead of in the loop *before* thread creation,
   otherwise the thread creation delays are incorporated in the
   performance measure, but ISTM that the point of pgbench is not to
   measure thread creation performance...

   I've added a comment suggesting where it should be put instead,
   first thing in threadRun.

--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 8c202bf..8d7979b 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -137,6 +137,12 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
+ * When threads are throttled to a given rate limit, this is the target delay
+ * to reach that rate in usec.  0 is the default and means no throttling.
+ */
+int64		throttle_delay = 0;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -205,6 +211,7 @@ typedef struct
 	int			nvariables;
 	instr_time	txn_begin;		/* used for measuring transaction latencies */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
+	bool		throttled;  /* whether current transaction was throttled */
 	int			use_file;		/* index in sql_files for this client */
 	bool		prepared[MAX_FILES];
 } CState;
@@ -222,6 +229,10 @@ typedef struct
 	instr_time *exec_elapsed;	/* time spent executing cmds (per Command) */
 	int		   *exec_count;		/* number of cmd executions (per Command) */
 	unsigned short random_state[3];		/* separate randomness for each thread */
+int64   throttle_trigger;  /* previous/next throttling (us) */
+	int64   throttle_lag;  /* avg lag transaction behind throttling */
+	int64   throttle_lag_max;  /* max lag */
+
 } TState;
 
 #define INVALID_THREAD		((pthread_t) 0)
@@ -230,6 +241,8 @@ typedef struct
 {
 	instr_time	conn_time;
 	int			xacts;
+	int64   throttle_lag;
+	int64   throttle_lag_max;
 } TResult;
 
 /*
@@ -355,6 +368,8 @@ usage(void)
 		 -n   do not run VACUUM before tests\n
 		 -N   do not update tables \pgbench_tellers\ and \pgbench_branches\\n
 		 -r   report average latency per command\n
+		 -R SPEC, --rate SPEC\n
+		  target rate in transactions per second\n
 		 -s NUM   report this scale factor in output\n
 		 -S   perform SELECT-only transactions\n
 	   -t NUM   number of transactions each client runs (default: 10)\n
@@ -902,13 +917,51 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 top:
 	commands = sql_files[st-use_file];
 
+	/* handle throttling once per transaction by inserting a sleep.
+	 * this is simpler than doing it at the end.
+	 */
+	if (throttle_delay  ! st-throttled)
+	{
+		/* compute delay to approximate a Poisson distribution
+		 * 100 = 13.8 .. 0 multiplier
+		 *  10 = 11.5 .. 0
+		 *   1 =  9.2 .. 0
+		 *1000 =  6.9 .. 0
+		 * if transactions are too slow or a given wait shorter than
+		 * a transaction, the next transaction will start right away.
+		 */
+		int64 wait = (int64)
+			throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
+
+		thread-throttle_trigger += wait;
+
+		st-until = thread-throttle_trigger;
+		st-sleeping = 1;
+		st-throttled = true;
+		if (debug)
+			fprintf(stderr, client %d throttling INT64_FORMAT us\n,
+	st-id, wait);
+	}
+
 	if (st-sleeping)
 	{			/* are we sleeping? */
 		instr_time	now;
+		int64 now_us;
 
 		INSTR_TIME_SET_CURRENT(now);
-		if (st-until = INSTR_TIME_GET_MICROSEC(now))
+		now_us = INSTR_TIME_GET_MICROSEC(now);
+		if (st-until = now_us)
+		{
 			st-sleeping = 0;	/* Done sleeping, go ahead with next command */
+			if (throttle_delay  st-state==0)
+			{
+/* measure lag of throttled transaction */
+int64 lag = now_us - st-until;
+thread-throttle_lag += lag;
+if (lag  thread-throttle_lag_max)
+	thread-throttle_lag_max = lag;
+			}
+		}
 		else
 			return true;		/* Still sleeping, nothing to do here */
 	}
@@ -1095,6 +1148,7 @@ top:
 			st-state = 0;
 			st-use_file = (int) getrand(thread, 0, num_files - 1);
 			commands = sql_files[st-use_file];
+			st-throttled = false;
 		}
 	}
 
@@ -2017,7 +2071,8 @@ process_builtin(char *tb)
 static void
 printResults(int ttype, int normal_xacts, int nclients,
 			 TState *threads, int nthreads,
-			 instr_time total_time, instr_time conn_total_time)
+			 instr_time total_time, instr_time conn_total_time,
+			 int64 throttle_lag, int64 throttle_lag_max)
 {
 	double		time_include,
 tps_include,
@@ -2055,6 +2110,18 @@ printResults(int ttype, int normal_xacts, int nclients,
 		printf(number of transactions actually processed: %d\n,
 			   normal_xacts);
 	}
+
+	if (throttle_delay)
+	{
+		/* Report average transaction lag under throttling, i.e. the delay
+

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-11 Thread Greg Smith


On 6/11/13 4:11 PM, Fabien COELHO wrote:

  - ISTM that there thread start time should be initialized at the
beginning of threadRun instead of in the loop *before* thread creation,
otherwise the thread creation delays are incorporated in the
performance measure, but ISTM that the point of pgbench is not to
measure thread creation performance...


I noticed that, but it seemed a pretty minor issue.  Did you look at the 
giant latency spikes at the end of the test run I submitted the graph 
for?  I wanted to nail down what was causing those before worrying about 
the startup timing.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Current CLOBBER_CACHE_ALWAYS failures

Christian Ullrich ch...@chrullrich.net writes:
 The CLOBBER_CACHE_ALWAYS animals (friarbird and jaguarundi) have been 
 failing persistently for about 36 hours now. The error is in a test 
 added by Tom's recent commit a4424c5:

Huh ... this is a pre-existing bug in the typcache code.  Will fix,
thanks for the report!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-11 Thread Alvaro Herrera

Liming Hu escribió:

 I have implemented the code according to Joe's suggestion, and put the code 
 at:
 https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1

Please submit a proper patch so it can be seen on our mailing list
archives.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-11 Thread Fabien COELHO




  - ISTM that there thread start time should be initialized at the
beginning of threadRun instead of in the loop *before* thread creation,
otherwise the thread creation delays are incorporated in the
performance measure, but ISTM that the point of pgbench is not to
measure thread creation performance...


I noticed that, but it seemed a pretty minor issue.


Not for me, because the max lag measured in my first version was really 
the threads creation time, not very interesting.


Did you look at the giant latency spikes at the end of the test run I 
submitted the graph for?


I've looked at the graph you sent. I must admit that I did not understand 
exactly what is measured and where it is measured. Because of its position 
at the end of the run, I thought of some disconnection related effects 
when pgbench run is interrupted by a time out signal, so some things are 
done more slowly. Fine with me, we are stopping anyway, and out of the 
steady state.


I wanted to nail down what was causing those before worrying about the 
startup timing.


Well, the short answer is that I'm not worried by that, for the reason 
explained above. I would be worried if it was anywhere else but where the 
transactions are interrupted, the connections are closed and the threads 
are stopped. I may be wrong:-)


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-11 Thread Liming Hu

On Tue, Jun 11, 2013 at 1:57 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Liming Hu escribió:

 I have implemented the code according to Joe's suggestion, and put the code 
 at:
 https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1

 Please submit a proper patch so it can be seen on our mailing list
 archives.

Hi Alvaro,

I am kind of new to the Postgresql hacker community, Can you please
help me on submit the
patch?

Thanks,

Liming
 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



--
Liming Hu
cell: (435)-512-4190
Seattle Washington


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] how to find out whether a view is updatable

Dean Rasheed dean.a.rash...@gmail.com writes:
 Here's a more complete patch along those lines. It defines the
 following pair of functions to test for updatability from SQL:

   FUNCTION pg_catalog.pg_relation_is_updatable(reloid oid,
include_triggers boolean)
   RETURNS integer

   FUNCTION pg_catalog.pg_column_is_updatable(reloid oid,
  attnum smallint,
  include_triggers boolean)
   RETURNS boolean

 and the following FDW functions:

   int IsForeignRelUpdatable (Oid foreigntableid);

   bool IsForeignColUpdatable (Oid foreigntableid,
   AttrNumber attnum);

I'm looking at this patch now.  I do not see the point of
pg_column_is_updatable nor IsForeignColUpdatable --- that's loading
additional complexity onto every FDW, to support what use-cases exactly?
Is it really likely that (a) there are any cases out there where FDWs
would support updating only some columns, and (b) anybody would care
whether or not information_schema.columns reflects such a restriction
accurately?  So I'm inclined to drop that part.

 As an initial implementation of this API in the postgres-fdw, I've
 added a new option updatable (true by default), which can be
 specified as a server option or as a per-table option, to give user
 control over whether individual foreign tables are read-only or
 updatable.

Do we really want that as a server option?  Why?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-11 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/11/2013 02:23 PM, Liming Hu wrote:
 On Tue, Jun 11, 2013 at 1:57 PM, Alvaro Herrera 
 alvhe...@2ndquadrant.com wrote:
 Liming Hu escribió:
 
 I have implemented the code according to Joe's suggestion, and
 put the code at: 
 https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1


 
Please submit a proper patch so it can be seen on our mailing list
 archives.
 
 Hi Alvaro,
 
 I am kind of new to the Postgresql hacker community, Can you
 please help me on submit the patch?

Hi Liming,

I might be able to help, but it will be at least a couple of days
before I have the time to look at this,

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRt50oAAoJEDfy90M199hliukP/A8IZf7L0DcYWG/jZUttTg0M
yWpCz1KHZ7fAGqeJ5ddzuSx5CMjsmt2zYQ+0EfLX1ftoSxIHaFpSC49GgJHUdAoq
TSOOC1rfkfmNJG98WfUPH14Flq4eEl9reUZDXXi3jqol+npJdAQaEt9EM5y0qkcB
pDCy9iMaYdjYNV6RXFOBNI+7Up43oULbMLhWwwFnGX9AgCLk8SGRZHnfT1zOaXYE
bW/Dl9TIu058ENZqFFLIfjxqngE/Y61SOaLRAxEkbO7HAFkuwgQwnIayrasR29F+
GX4+HRcsd0jrcF858Dm4E+YAffI2quOR2HgZTGQj6jTNtRpb16+EI+X1vrYI3AH8
5tf3exq9FDL/02zlCpKm8+uXksaLzffQXlbQPw8uDlpk+ThD0uo2990/TC6QYgXO
o2vCu+nkWdTc2AUk4NkoVFwXaaFLZ+M8U+swRHNAShqH7VQVx8rRwubsRJ1msT3i
nC8BFyMsBzOmsKNwO0IM6ZcsJXaIpmMCUshF8cGDfgpsWUQ/wzovzopb+PSiEKQB
X45hWYtiK3tIvo0f9gvEWzRJ4+O8tmmIZzzWz127yBdugV0xjsEHnDihpjkeMPJx
WHs5ViN62u62r34UCtX1oiClCC+FYR0f//alh48VlXWzP7NkFt4dcKOD6ZCzSOeQ
udKM0QF2TNbUebj9QIpq
=W6Oy
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-11 Thread Kevin Grittner

Liming Hu dawnin...@gmail.com wrote:

 I am kind of new to the Postgresql hacker community, Can you
 please help me on submit the patch?

You might want to read this page:

http://wiki.postgresql.org/wiki/Developer_FAQ

In particular, the Development Process section has a link to
Submitting a Patch that you should read.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: Should array_length() Return NULL)

2013-06-11 Thread Brendan Jurd

On 12 June 2013 04:43, Josh Berkus j...@agliodbs.com wrote:
 What's the status on this patch and current approach to ZDA?

Alright, it might be a good idea to have a quick recap.

Last time, on Arrays Of Our Lives ...

So I proposed and posted a patch aimed at deprecating zero-D arrays,
and replacing them with 1-D empty arrays.

That kicked off a long discussion, with some folks (Robert, Merlin,
Florian et al) vigorously opposing any change in the behaviour of the
array_(length|ndims|lower|upper) functions.  To those folks, the
behaviour would have to be a lot worse than it currently is to justify
breaking compatibility with existing applications.

The idea of using a GUC to smooth over the compatibility break was
suggested, and firmly rejected.

The idea of creating an entirely new type with nicer behaviours came
up, but that wouldn't really fly because arrays have already hogged
all the best syntax.

Since compatibility breakage is so contentious, I suggested that we
forget about changing the array representation and just add new
functions with more sensible behaviours:

* cardinality(anyarray) to return the length of the first dimension,
zero if empty, and
* array_num_items(anyarray) to return the total number of element
positions per ArrayGetNItems, zero if empty.

There have been attempts to add a cardinality function in the past, as
it is required by the SQL spec, but these attempts have stalled when
trying to decide how it should handle multidim arrays.  Having it
return the length of the first dimension is the more spec-compatible
way to go, but some folks argued that it should work as
ArrayGetNItems, because we don't already have a function for that at
the SQL level.  Therefore I propose we add cardinality() per the spec,
and another function to expose ArrayGetNItems.

And that's about where we got to, when the whole discussion was put on
a time-out to make room for the beta.

I am withdrawing the original zero-D patch in favour of the proposed
new functions.  If you have an opinion about that, please do chime in.
 Depending on how that goes I may post a patch implementing my new
proposal in the next few days.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-11 Thread Liming Hu

On Tue, Jun 11, 2013 at 2:56 PM, Joe Conway m...@joeconway.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 06/11/2013 02:23 PM, Liming Hu wrote:
 On Tue, Jun 11, 2013 at 1:57 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Liming Hu escribió:

 I have implemented the code according to Joe's suggestion, and
 put the code at:
 https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1



 Please submit a proper patch so it can be seen on our mailing list
 archives.

 Hi Alvaro,

 I am kind of new to the Postgresql hacker community, Can you
 please help me on submit the patch?

 Hi Liming,

 I might be able to help, but it will be at least a couple of days
 before I have the time to look at this,

 Joe


ok, thanks, I will wait.

Liming
 - --
 Joe Conway
 credativ LLC: http://www.credativ.us
 Linux, PostgreSQL, and general Open Source
 Training, Service, Consulting,  24x7 Support
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIcBAEBAgAGBQJRt50oAAoJEDfy90M199hliukP/A8IZf7L0DcYWG/jZUttTg0M
 yWpCz1KHZ7fAGqeJ5ddzuSx5CMjsmt2zYQ+0EfLX1ftoSxIHaFpSC49GgJHUdAoq
 TSOOC1rfkfmNJG98WfUPH14Flq4eEl9reUZDXXi3jqol+npJdAQaEt9EM5y0qkcB
 pDCy9iMaYdjYNV6RXFOBNI+7Up43oULbMLhWwwFnGX9AgCLk8SGRZHnfT1zOaXYE
 bW/Dl9TIu058ENZqFFLIfjxqngE/Y61SOaLRAxEkbO7HAFkuwgQwnIayrasR29F+
 GX4+HRcsd0jrcF858Dm4E+YAffI2quOR2HgZTGQj6jTNtRpb16+EI+X1vrYI3AH8
 5tf3exq9FDL/02zlCpKm8+uXksaLzffQXlbQPw8uDlpk+ThD0uo2990/TC6QYgXO
 o2vCu+nkWdTc2AUk4NkoVFwXaaFLZ+M8U+swRHNAShqH7VQVx8rRwubsRJ1msT3i
 nC8BFyMsBzOmsKNwO0IM6ZcsJXaIpmMCUshF8cGDfgpsWUQ/wzovzopb+PSiEKQB
 X45hWYtiK3tIvo0f9gvEWzRJ4+O8tmmIZzzWz127yBdugV0xjsEHnDihpjkeMPJx
 WHs5ViN62u62r34UCtX1oiClCC+FYR0f//alh48VlXWzP7NkFt4dcKOD6ZCzSOeQ
 udKM0QF2TNbUebj9QIpq
 =W6Oy
 -END PGP SIGNATURE-



--
Liming Hu
cell: (435)-512-4190
Seattle Washington


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Noah Misch

On Tue, Jun 11, 2013 at 02:10:45PM -0400, Andrew Dunstan wrote:

 On 06/10/2013 11:22 PM, Noah Misch wrote:
 On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote:
 On 06/10/2013 10:18 AM, Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
 After thinking about this some more I have come to the conclusion that
 we should only do any de-escaping of \u sequences, whether or not
 they are for BMP characters, when the server encoding is utf8. For any
 other encoding, which is already a violation of the JSON standard
 anyway, and should be avoided if you're dealing with JSON, we should
 just pass them through even in text output. This will be a simple and
 very localized fix.
 Hmm.  I'm not sure that users will like this definition --- it will seem
 pretty arbitrary to them that conversion of \u sequences happens in some
 databases and not others.
 Yep.  Suppose you have a LATIN1 database.  Changing it to a UTF8 database
 where everyone uses client_encoding = LATIN1 should not change the semantics
 of successful SQL statements.  Some statements that fail with one database
 encoding will succeed in the other, but a user should not witness a changed
 non-error result.  (Except functions like decode() that explicitly expose 
 byte
 representations.)  Having SELECT '[\u00e4]'::json - 0 emit '?' in the
 UTF8 database and '\u00e4' in the LATIN1 database would move PostgreSQL in 
 the
 wrong direction relative to that ideal.

 As a final counter example, let me note that Postgres itself handles  
 Unicode escapes differently in UTF8 databases - in other databases it  
 only accepts Unicode escapes up to U+007f, i.e. ASCII characters.

I don't see a counterexample there; every database that accepts without error
a given Unicode escape produces from it the same text value.  The proposal to
which I objected was akin to having non-UTF8 databases silently translate
E'\u0220' to E'\\u0220'.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Clean switchover

2013-06-11 Thread Fujii Masao

Hi,

In streaming replication, when we shutdown the master, walsender tries to
send all the outstanding WAL records including the shutdown checkpoint
record to the standby, and then to exit. This basically means that all the
WAL records are fully synced between two servers after the clean shutdown
of the master. So, after promoting the standby to new master, we can
restart the stopped master as new standby without the need for a fresh
backup from new master.

But there is one problem: though walsender tries to send all the outstanding
WAL records, it doesn't wait for them to be replicated to the standby. IOW,
walsender closes the replication connection as soon as it sends WAL records.
Then, before receiving all the WAL records, walreceiver can detect
the closure of connection and exit. We cannot guarantee that there is no
missing WAL in the standby after clean shutdown of the master. In this case,
backup from new master is required when restarting the stopped master as
new standby. I have experienced this case several times, especially when
enabling WAL archiving.

The attached patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the standby
before closing the replication connection.

You may be concerned the case where the standby gets stuck and the
walsender keeps waiting for the reply from that standby. In this case,
wal_sender_timeout detects such inactive standby and then walsender
ends. So even in that case, the shutdown can end.

Thought?

Regards,

-- 
Fujii Masao


switchover_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs



On 06/11/2013 06:26 PM, Noah Misch wrote:



As a final counter example, let me note that Postgres itself handles
Unicode escapes differently in UTF8 databases - in other databases it
only accepts Unicode escapes up to U+007f, i.e. ASCII characters.

I don't see a counterexample there; every database that accepts without error
a given Unicode escape produces from it the same text value.  The proposal to
which I objected was akin to having non-UTF8 databases silently translate
E'\u0220' to E'\\u0220'.


What?

There will be no silent translation. The only debate here is about how 
these databases turn strings values inside a json datum into PostgreSQL 
text values via the documented operation of certain functions and 
operators. If the JSON datum doesn't already contain a unicode escape 
then nothing of what's been discussed would apply. Nothing whatever 
that's been proposed would cause a unicode escape sequence to be emitted 
that wasn't already there in the first place, and no patch that I have 
submitted has contained any escape sequence generation at all.


cheers

andrew









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallell Optimizer

 Please explain what you mean by the word true used here.
 In another word, eager replication.
 Do you mean something along these lines :
 
 Most synchronous or eager replication solutions do conflict prevention,
 while asynchronous solutions have to do conflict resolution. For instance,
 if a record is changed on two nodes simultaneously, an eager replication
 system would detect the conflict before confirming the commit and abort
 one of the transactions. A lazy replication system would allow both
 transactions to commit and run a conflict resolution during
 resynchronization. 
 
 ?

No, I'm not talking about conflict resolution.

From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
--
Eager or Lazy Replication?
 Eager replication:
 keep all replicas synchronized by updating all
 replicas in a single transaction

 Lazy replication:
 asynchronously propagate replica updates to
 other nodes after replicating transaction commits
--

Parallel query execution needs to assume that each node synchronized
in a commit, otherwise the summary of each query result executed on
each node is meaningless.

 IMO it is possible to do this easily once BDR has reached the state
 where you
 can do streaming apply.
 That is, you replay actions on other hosts as they
 are logged, not after the transaction commits. Doing it this way you can
 wait
 any action to successfully complete a full circle before committing it
 in source.
 
 Currently main missing part in doing this is autonomous transactions.
 It can in theory be done by opening an extra backend for each incoming
 transaction but you will need really big number of backends and also you
 have extra overhead from interprocess communications.

Thanks for a thought about the conflict resolution in BDR.

BTW, if we seriously think about implementing the parallel query
execution, we need to find a way to distribute data among each node,
that requires partial copy of table. I thinl that would a big
challenge for WAL based replication.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallell Optimizer

On 06/12/2013 01:01 AM, Tatsuo Ishii wrote:
 Please explain what you mean by the word true used here.
 In another word, eager replication.
 Do you mean something along these lines :

 Most synchronous or eager replication solutions do conflict prevention,
 while asynchronous solutions have to do conflict resolution. For instance,
 if a record is changed on two nodes simultaneously, an eager replication
 system would detect the conflict before confirming the commit and abort
 one of the transactions. A lazy replication system would allow both
 transactions to commit and run a conflict resolution during
 resynchronization. 

 ?
 No, I'm not talking about conflict resolution.

 From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
 --
 Eager or Lazy Replication?
  Eager replication:
  keep all replicas synchronized by updating all
  replicas in a single transaction
Ok, so you are talking about distributed transactions ?

In our current master-slave replication, how would it be different from
current synchronous replication ?

Or does it make sense only in case of multimaster replication ?

The main problems with keep all replicas synchronized by updating all
replicas in a single transaction
are performance and reliability.

That is, the write performance has to be less than for single server and
failure of a single replica brings down the whole cluster.


  Lazy replication:
  asynchronously propagate replica updates to
  other nodes after replicating transaction commits
 --

 Parallel query execution needs to assume that each node synchronized
 in a commit, otherwise the summary of each query result executed on
 each node is meaningless.

 IMO it is possible to do this easily once BDR has reached the state
 where you
 can do streaming apply.
 That is, you replay actions on other hosts as they
 are logged, not after the transaction commits. Doing it this way you can
 wait
 any action to successfully complete a full circle before committing it
 in source.

 Currently main missing part in doing this is autonomous transactions.
 It can in theory be done by opening an extra backend for each incoming
 transaction but you will need really big number of backends and also you
 have extra overhead from interprocess communications.
 Thanks for a thought about the conflict resolution in BDR.

 BTW, if we seriously think about implementing the parallel query
 execution, we need to find a way to distribute data among each node,
 that requires partial copy of table. I thinl that would a big
 challenge for WAL based replication.
Moving partial query results around is completely different problem from
replication.

We should not mix these.

If on the other hand think about sharding (that is having a table
partitioned
between nodes) then this can be done in BDR.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] how to find out whether a view is updatable

2013-06-11 Thread Dean Rasheed

On 11 June 2013 22:53, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 Here's a more complete patch along those lines. It defines the
 following pair of functions to test for updatability from SQL:

   FUNCTION pg_catalog.pg_relation_is_updatable(reloid oid,
include_triggers boolean)
   RETURNS integer

   FUNCTION pg_catalog.pg_column_is_updatable(reloid oid,
  attnum smallint,
  include_triggers boolean)
   RETURNS boolean

 and the following FDW functions:

   int IsForeignRelUpdatable (Oid foreigntableid);

   bool IsForeignColUpdatable (Oid foreigntableid,
   AttrNumber attnum);

 I'm looking at this patch now.  I do not see the point of
 pg_column_is_updatable nor IsForeignColUpdatable --- that's loading
 additional complexity onto every FDW, to support what use-cases exactly?
 Is it really likely that (a) there are any cases out there where FDWs
 would support updating only some columns, and (b) anybody would care
 whether or not information_schema.columns reflects such a restriction
 accurately?  So I'm inclined to drop that part.


I originally thought of adding pg_column_is_updatable() because I was
imagining supporting more of the SQL standard on updatable views,
which allows for a subset of the columns to be updatable, but we could
always add such a function when/if we implement that feature. As for
IsForeignColUpdatable(), I think you're probably right. If it's only
purpose is to support information_schema.columns, it's probably of
very limited interest to anyone.


 As an initial implementation of this API in the postgres-fdw, I've
 added a new option updatable (true by default), which can be
 specified as a server option or as a per-table option, to give user
 control over whether individual foreign tables are read-only or
 updatable.

 Do we really want that as a server option?  Why?


Not sure. I thought it might be useful if you were setting up a
connection to a foreign server and you knew that you only wanted read
access to all the tables in it, this would avoid having to specify the
option on every table.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallell Optimizer

 No, I'm not talking about conflict resolution.

 From 
 http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
 --
 Eager or Lazy Replication?
  Eager replication:
  keep all replicas synchronized by updating all
  replicas in a single transaction
 Ok, so you are talking about distributed transactions ?
 
 In our current master-slave replication, how would it be different from
 current synchronous replication ?
 
 Or does it make sense only in case of multimaster replication ?
 
 The main problems with keep all replicas synchronized by updating all
 replicas in a single transaction
 are performance and reliability.
 
 That is, the write performance has to be less than for single server

That's just a log based replication's specific limitation. It needs to
wait for log replay, which is virtually same as a cluster wide giant
lock.  On the other hand, non log based replication systems (if my
understanding is correct, Postgres-XC is the case) could perform
better than single server.

 and
 failure of a single replica brings down the whole cluster.

That's a price of eager replication. However it could be mitigated
by using existing HA technologies.

  Lazy replication:
  asynchronously propagate replica updates to
  other nodes after replicating transaction commits
 --

 Parallel query execution needs to assume that each node synchronized
 in a commit, otherwise the summary of each query result executed on
 each node is meaningless.

 IMO it is possible to do this easily once BDR has reached the state
 where you
 can do streaming apply.
 That is, you replay actions on other hosts as they
 are logged, not after the transaction commits. Doing it this way you can
 wait
 any action to successfully complete a full circle before committing it
 in source.

 Currently main missing part in doing this is autonomous transactions.
 It can in theory be done by opening an extra backend for each incoming
 transaction but you will need really big number of backends and also you
 have extra overhead from interprocess communications.
 Thanks for a thought about the conflict resolution in BDR.

 BTW, if we seriously think about implementing the parallel query
 execution, we need to find a way to distribute data among each node,
 that requires partial copy of table. I thinl that would a big
 challenge for WAL based replication.
 Moving partial query results around is completely different problem from
 replication.
 
 We should not mix these.

I just explained why log based replication could not be a
infrastructure for the parallel query execution. One reason is lazy
replication, the other is the ability of partial copy.

 If on the other hand think about sharding (that is having a table
 partitioned
 between nodes) then this can be done in BDR.

Ok, I didn't know that BRD can do it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Adding IEEE 754:2008 decimal floating point and hardware support for it

2013-06-11 Thread Craig Ringer

Hi all

Currently DECIMAL is an alias for NUMERIC, Pg's built-in arbitrary
precision and scale decimal type. I'd like to explore the possibility of
using hardware decimal floating point support in newer processors,
compilers and C libraries to enhance DECIMAL / NUMERIC performance.

With the advent of _Decimal32, _Decimal64 and _Decimal128 support in
IEEE 754:2008 as supported in gcc in float.h TR24732 we have the
opportunity to make use of hardware representations of decimal floating
point values and hardware implementations of operations on them, gaining
a potentially huge performance boost in exchange for more limited
precision and scale. I'd like to gather ideas and suggestions about how
we might approach this.

The main thing I'm wondering is how/if to handle backward compatibility
with the existing NUMERIC and its DECIMAL alias, or whether adding new
DECIMAL32, DECIMAL64, and DECIMAL128 types would be more appropriate.
I'd love to just use the SQL standard types name DECIMAL if possible,
and the standard would allow for it (see below), but backward compat
would be a challenge, as would coming up with a sensible transparent
promotion scheme from 32-64-128-numeric and ways to stop undesired
promotion.

What I'm currently thinking of is using the same strategy we use right
now for FLOAT(n) where we select between float4 and float8 based on the
specified precision. We could do the same for DECIMAL; up to
DECIMAL(94,7) would become decimal32; up to DECIMAL(382,16) would become
decimal64 and DECIMAL128(34,6142); everything higher would become
NUMERIC as currently. NUMERIC would be unaffected. (Ideally we wouldn't
have to do the type change in the parser hack but that's not really
possible so long as Pg doesn't preserve typmods in calculations and
intermediate outputs).

According to TR24732 (
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1312.pdf) the _Decimal
family of types offer:

_Decimal32: 7 coefficient digits, 10^97 to 10^-94 range.
_Decimal64: 16 coefficient digits, 10^385 to 10^-382 range.
_Decimal128 34 coefficient digits, 10^6145 to 10^-6142 range.

There was a thread about this on -general some time ago:

http://www.postgresql.org/message-id/4cb26b16.7080...@postnewspapers.com.au

that never went anywhere. Other discussion is mostly of use cases and is
more hypothetical, but outlines why they'd be useful:

http://www.postgresql.org/message-id/CADLWmXVmne9t5x-hR-XGOxEyOWQX5BfZwc9Qb=xhsj_gkg_...@mail.gmail.com

In terms of how they fit in to the standard, the copy of the SQL:2008
draft I have here says:

* NUMERIC specifies the data type exact numeric, with the decimal
precision and scale specified by the
precision and scale.
* DECIMAL specifies the data type exact numeric, with the decimal scale
specified by the scale and
the implementation-defined decimal precision equal to or greater than
the value of the specified precision.

Additionally:

* For the exact numeric types DECIMAL and NUMERIC, the maximum values
of precision and
of scale are implementation-defined.

... so it seems we'd be free to use the hardware types and could
possibly internally promote from smaller to larger decimal types as
appropriate.

My main concern is that even the largest fixed-size decimal can't store
the arbitrary precision and scale supported by Pg's NUMERIC, and people
are used to using DECIMAL as an alias for NUMERIC. We're in a bit of a
BC trap, as the spec would allow us to just use the hardware types, but
we've already provided support for a nearly unlimited (but much, much
slower) type using the same name.

Regarding the breadth of support, it looks like we could use gcc's
built-in types if available, and otherwise fall back to one of the
portable decimal floating point maths libraries.

gcc 4.3 added _Decimal types, see
http://gcc.gnu.org/gcc-4.3/changes.html, so it's ancient history for
gcc. Hardware support isn't required; in fact the real question would be
whether gcc actually uses the hardware operations where they're
supported. I'll need to dig into that.

For Windows, MSVC doesn't support the types as built-ins. There's the
Intel Decimal Floating Point Library
(http://software.intel.com/en-us/articles/intel-decimal-floating-point-math-library,
also available from http://www.netlib.org/misc/intel/) to provide
support as a library. IBM's decnumber libraray is another possible
alternative.

LLVM's clang does not support those types according to the manual:
clang does not support decimal floating point types (_Decimal32 and
friends) or fixed-point types (_Fract and friends); nobody has expressed
interest in these features yet, so it's hard to say when they will be
implemented. (http://clang.llvm.org/docs/UsersManual.html). The Intel
library or IBM decnumber should work, but would need to be checked.

See
http://www.ac.usc.es/arith19/sites/default/files/3670a225-spec-session-DFP-paper2.pdf
for (somewhat old) details on Intel processor support. There's also
support in POWER6 and System z10

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Noah Misch

On Tue, Jun 11, 2013 at 06:58:05PM -0400, Andrew Dunstan wrote:

 On 06/11/2013 06:26 PM, Noah Misch wrote:

 As a final counter example, let me note that Postgres itself handles
 Unicode escapes differently in UTF8 databases - in other databases it
 only accepts Unicode escapes up to U+007f, i.e. ASCII characters.
 I don't see a counterexample there; every database that accepts without error
 a given Unicode escape produces from it the same text value.  The proposal to
 which I objected was akin to having non-UTF8 databases silently translate
 E'\u0220' to E'\\u0220'.

 What?

 There will be no silent translation. The only debate here is about how  
 these databases turn strings values inside a json datum into PostgreSQL  
 text values via the documented operation of certain functions and  
 operators. If the JSON datum doesn't already contain a unicode escape  
 then nothing of what's been discussed would apply. Nothing whatever  
 that's been proposed would cause a unicode escape sequence to be emitted  
 that wasn't already there in the first place, and no patch that I have  
 submitted has contained any escape sequence generation at all.

Under your proposal to which I was referring, this statement would return true
in UTF8 databases and false in databases of other encodings:

SELECT '[\u0220]'::json - 0 = E'\u0220'

Contrast the next statement, which would return false in UTF8 databases and
true in databases of other encodings:

SELECT '[\u0220]'::json - 0 = E'\\u0220'

Defining -(json,int) and -(json,text) in this way would be *akin to*
having SELECT E'\u0220' = E'\\u0220' return true in non-UTF8 databases.  I
refer to user-visible semantics, not matters of implementation.  Does that
help to clarify my earlier statement?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding IEEE 754:2008 decimal floating point and hardware support for it

Craig Ringer cr...@2ndquadrant.com writes:
 Currently DECIMAL is an alias for NUMERIC, Pg's built-in arbitrary
 precision and scale decimal type. I'd like to explore the possibility of
 using hardware decimal floating point support in newer processors,
 compilers and C libraries to enhance DECIMAL / NUMERIC performance.

As near as I can tell, there is no such hardware support.  The Intel
paper you reference describes a pure-software library, and states
A software implementation was deemed sufficient for the foreseeable
future.

The source code for that library is apparently available under a
liberal license.  It might be more useful to eyeball what they did
and see if we can learn anything towards speeding up the existing
variable-precision NUMERIC type.

 The main thing I'm wondering is how/if to handle backward compatibility
 with the existing NUMERIC and its DECIMAL alias, or whether adding new
 DECIMAL32, DECIMAL64, and DECIMAL128 types would be more appropriate.
 I'd love to just use the SQL standard types name DECIMAL if possible,
 and the standard would allow for it (see below), but backward compat
 would be a challenge, as would coming up with a sensible transparent
 promotion scheme from 32-64-128-numeric and ways to stop undesired
 promotion.

Indeed.  I think you're basically between a rock and a hard place there.
It would be very very difficult to shoehorn such types into the existing
numeric hierarchy if you wanted any sort of transparency of behavior,
I fear.  On the other hand, I doubt that it's going to work to make the
existing numeric type switch to the hardware representation for
suitably-constrained columns, because what are you going to do when,
say, the result of an addition overflows the hardware width?  You can't
just throw an error immediately, because you won't know whether the
output is supposed to be getting shoved back into a limited-width column
or not.

And on top of that, you have the very strong likelihood that the
hardware implementation(s) won't behave exactly like our existing
NUMERIC routines --- for instance, I'd bet a nickel that Intel took more
care with last-place roundoff than our code does.  So now we would have
not just backwards-compatibility worries, but platform-dependent results
for a data type that didn't use to have any such issue.  I think people
who expect NUMERIC to be exact would really get bent out of shape
about that idea.

On the whole, I think the effort would be a lot more usefully spent on
trying to make the existing NUMERIC support go faster.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs



On 06/11/2013 08:18 PM, Noah Misch wrote:

On Tue, Jun 11, 2013 at 06:58:05PM -0400, Andrew Dunstan wrote:

On 06/11/2013 06:26 PM, Noah Misch wrote:

As a final counter example, let me note that Postgres itself handles
Unicode escapes differently in UTF8 databases - in other databases it
only accepts Unicode escapes up to U+007f, i.e. ASCII characters.

I don't see a counterexample there; every database that accepts without error
a given Unicode escape produces from it the same text value.  The proposal to
which I objected was akin to having non-UTF8 databases silently translate
E'\u0220' to E'\\u0220'.

What?

There will be no silent translation. The only debate here is about how
these databases turn strings values inside a json datum into PostgreSQL
text values via the documented operation of certain functions and
operators. If the JSON datum doesn't already contain a unicode escape
then nothing of what's been discussed would apply. Nothing whatever
that's been proposed would cause a unicode escape sequence to be emitted
that wasn't already there in the first place, and no patch that I have
submitted has contained any escape sequence generation at all.

Under your proposal to which I was referring, this statement would return true
in UTF8 databases and false in databases of other encodings:

 SELECT '[\u0220]'::json - 0 = E'\u0220'

Contrast the next statement, which would return false in UTF8 databases and
true in databases of other encodings:

 SELECT '[\u0220]'::json - 0 = E'\\u0220'

Defining -(json,int) and -(json,text) in this way would be *akin to*
having SELECT E'\u0220' = E'\\u0220' return true in non-UTF8 databases.  I
refer to user-visible semantics, not matters of implementation.  Does that
help to clarify my earlier statement?


Well, I think that's drawing a bit of a long bow, but never mind.

If we work by analogy to Postgres' own handling of Unicode escapes, 
we'll raise an error on any Unicode escape beyond ASCII (not on input 
for legacy reasons, but on trying to process such datums). I gather that 
would meet your objection.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Configurable location for extension .control files

2013-06-11 Thread Tom Dunstan

Hi Josh

On 11 June 2013 04:37, Josh Berkus j...@agliodbs.com wrote:

 I don't personally see a reason for plural locations, but it would be
 nice if it recursed (that is, looked for .so's in subdirectories).  My
 reason for this is that I work on applications which have in-house
 extensions as well as public ones, and I'd like to keep the two
 separated by directory.


I gave one example of a use-case for multiple directories upthread - the
Postgres.app mac app has contrib, plv8 and postgis bundled under its
application folder, but it would be nice to allow users to drop extra
extensions under ~/Library/Postgres.app somewhere.

If we had directory recursion then you could sort of fake it using symlinks
(as long as we follow the symlinks) but it's pretty messy, the wrapper app
would have to make a dir under ~/Library the actual extension dir and have
a symlink back to its bundled extensions. Not the end of the world though.

For any of that to work the dir (or dirs) would need to be settable by
config or startup option - compiled in wouldn't cut it, since the absolute
dir of the end user's home directory isn't known at compile time.

Cheers

Tom

Re: [HACKERS] Clean switchover