Re: [HACKERS] DO ... RETURNING
On 06/11/2013 06:17 AM, Pavel Stehule wrote: 2013/6/10 Hannu Krosing ha...@2ndquadrant.com: On 06/10/2013 09:45 PM, Pavel Stehule wrote: 2013/6/10 David Fetter da...@fetter.org: On Mon, Jun 10, 2013 at 09:23:19PM +0200, Pavel Stehule wrote: 2013/6/10 Hannu Krosing ha...@2ndquadrant.com: Hallo Everybody As far as I can see, currently you can not return anything out of a DO (anonymous code) block. Something like DO LANGUAGE plpythonu RETURNS TABLE (name text, uid int, gid int) $$ with open('/etc/passwd') as f: fields = f.readline().split(':') while fields: name, uid, gid = fields[0], int(fields[2]),int(fields[3]) yield name, uid, gid fields = f.readline().split(':') $$; As I did not pay attention when DO was introduced, I thought it is faster to ask here than read all possibly relevant mails in archives So: has there been a discussion on extending the DO construct with ability to rturn data out of it, similar to what named functions do. If there was then what were the arguments against doing this ? Or was this just that it was not thought important at that time ? I don't like this idea. I know so DO is +/- function, but it is too restrict. I hope so we will have a procedures with possibility unbound queries. and then you can do DO $$ SELECT * FROM pg_class; SELECT * FROM pg_proc; ... $$ LANGUAGE SQL; and you don't need to define output structure - what is much more user friendly. If I understand the proposal correctly, the idea is only to try to return something when DO is invoked with RETURNING. 1. Did I understand correctly, Hannu? 2. If I did, does this alleviate your concerns, Pavel? not too much. Two different concepts in one statement is not good idea. What two different concepts do you mean ? What using a cursors as temporary solution? BEGIN; DO $$ BEGIN OPEN mycursor AS SELECT * FROM blablabla; END $$ FETCH FROM mycursor; COMMIT; How would this work in an SQL query ? SELECT * FROM (FETCH FROM mycursor ) mc; we doesn't support it, but oracle, db2 allows SELECT * FROM TABLE(cursorname) ? Still I don't like this idea, because you should to support DO RETURNING in other statements - like INSERT INTO DO RETURNING ??? Yes, I really would like DO to be full set returning construct similar to SELECT or I/U/D RETURNING. The syntax should be either RETURNS (as in function definition) or RETURNING as for I/U/D. I actually like the RETURNING better as it really does immediate return and not just defines a function returning something. What about local temporary functions ?? CREATE TEMPORARY FUNCTION xx(a int) RETURNES TABLE (xxx) SELECT * FROM xxx; You mean that we define and use it in the same statement and after ';' ends the statement it disappears from scope ? This would probably still bloat pg_function table ? it is same hard issue like TEMPORARY TABLES Hannu, what is motivation for your proposal??? To be able to use pl/xxx languages on (almost) the same way we use SQL now. The current situation is akin to not being able to use queries directly but always requiring you to create a view first and then do select ... from myview Think of DO ... RETURNING as inline function definition and usage in one go. I have a two objections: * it is not too user friendly - you have to specify returns list every time, what is not comfort for very short life objects If you can somehow avoid it for functions, then we could use the same techniques for anonymous in-line functions as well. And it is still more user friendly than creating a temporary function for exactly the same purpose. * it is on way to introduce lot of NOT ANSI SQL extensions, that are not in other databases, The whole pluggable language and CREATE FUNCTION is NOT ANSI SQL extensions, so why not make this easier to use. * it doesn't carry really new functionality Yes, as pl/pgsql is (presumably :) ) Turing complete, no other pl-s carry any new functionality . What this does is making existing functionality more usable. Again, if you have some good way to avoid specifying returns list and deduce it from contents, i would be very interested. This seems something relatively easy to do for SQL and PL/PgSQL functions but much harder to achieve for a pl language in general. Almost the same functionality and syntax could be achieved if we assume DO to always return SETOF RECORD and require return definition on use like we do for functions returning SETOF RECORD so instead of hannu=# CREATE OR REPLACE FUNCTION testf(numeric) RETURNS SETOF RECORD as $$ return (('a','b','c'),('d','e','f')) $$ LANGUAGE plpythonu; CREATE FUNCTION hannu=# SELECT * FROM testf(1) AS f(a text, b text, c text); a | b | c ---+---+--- a | b | c d | e | f (2 rows) one could write directly DO LANGUAGE plpythonu $$ return (('a','b','c'),('d','e','f')) $$ AS f(a text, b text, c text); and get the same result. The reason
Re: [HACKERS] Parallell Optimizer
On 11 June 2013 01:45, Tatsuo Ishii is...@postgresql.org wrote: On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs si...@2ndquadrant.com wrote: On 7 June 2013 20:23, Tom Lane t...@sss.pgh.pa.us wrote: As for other databases, I suspect that ones that have parallel execution are probably doing it with a thread model not a process model. Separate processes are more common because it covers the general case where query execution is spread across multiple nodes. Threads don't work across nodes and parallel queries predate (working) threading models. Indeed. Parallelism based on processes would be more convenient for master-master type of applications. Even if no master-master feature is implemented directly in core, at least a parallelism infrastructure based on processes could be used for this purpose. As long as true synchronous replication is not implemented in core, I am not sure there's a value for parallel execution spreading across multile nodes because of the delay of data update propagation. Please explain what you mean by the word true used here. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
Ants, the more I think about this, the more I start to like it. On 06/07/2013 02:50 PM, Ants Aasma wrote: On Fri, Jun 7, 2013 at 2:59 PM, Markus Wanner mar...@bluegap.ch wrote: Agreed. Postgres-R uses a CommitOrderId, which is very similar in concept, for example. Do you think having this snapshot scheme would be helpful for Postgres-R? Yeah, it could help to reduce patch size, after a rewrite to use such a CSN. Or why do you need to tell apart aborted from in-progress transactions by CSN? I need to detect aborted transactions so they can be discared during the eviction process, otherwise the sparse array will fill up. They could also be filtered out by cross-referencing uncommitted slots with the procarray. Having the abort case do some additional work to make xid assigment cheaper looks like a good tradeoff. I see. Sparse buffer needs to be at least big enough to fit CSN slots for the xids of all active transactions and non-overflowed subtransactions. At the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out at 16 bytes * (64 + 1) slots * 100 = backends = 101.6KB per buffer, or 203KB total in the default configuration. A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess the given 16 bytes includes alignment to 8 byte boundaries. Sounds good. 8 byte alignment for CSNs is needed for atomic if not something else. Oh, right, atomic writes. I think the size could be cut in half by using a base value for CSNs if we assume that no xid is active for longer than 2B transactions as is currently the case. I didn't want to include the complication in the first iteration, so I didn't verify if that would have any gotchas. In Postgres-R, I effectively used a 32-bit order id which wraps around. In this case, I guess adjusting the base value will get tricky. Wrapping could probably be used as well, instead. The number of times each cache line can be invalidated is bounded by 8. Hm.. good point. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to find out whether a view is updatable
On 11 June 2013 01:03, Michael Paquier michael.paqu...@gmail.com wrote: Sorry for my late reply. On Sun, Jun 9, 2013 at 6:45 PM, Dean Rasheed dean.a.rash...@gmail.com wrote: I called it updatable rather than writable or read-only because it might perhaps be extended in the future with separate options for insertable and deletable. It could also be extended to give column-level control over updatability, or something like use_remote_updatability could be added, but that all feels like 9.4 material. Yes this is definitely material for 9.4. You should add this patch to the 1st commit fest. I'll add myself as a reviewer. Thanks, Thanks. Arguably though, the API changes are something that should be sorted out in 9.3, but I'm not sure how much of an appetite there is for that, or whether it's too late. pg_view_is_updatable() and pg_view_is_insertable() are both new to 9.3. They were designed purely to support the information schema views, but are inadequate for most other practical purposes. Once 9.3 is out, we'll be stuck with them - although of course that doesn't stop us adding more functions, I think it would be better to replace them now. Likewise the writable FDW API is new to 9.3, so I think 9.3 should at least decide on the API for a FDW to specify whether a foreign table is updatable. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallell Optimizer
On 06/10/2013 10:37 PM, FredDaniPandoraAquiles wrote: Hi, I asked a while ago in this group about the possibility to implement a parallel planner in a multithread way, and the replies were that the proposed approach couldn't be implemented, because the postgres is not thread-safe. With the new feature Background Worker Processes, such implementation would be possible? Well, there are versions of genetic algorithms that use the concept of islands in which the populations evolve in parallel in the different islands and allows interaction between the islands and so on. I'm working in an algorithm based on multiagent systems. At the present moment, I mean in H2, the agents are threads, there are a few locks related to agents solutions, and a few locks for the best current solution in the environment where the agents are 'running'. The agents can exchange messages with a purpose. The environment is shared by the all agents and they use the environment to get informations from another agents (current solution for example), tries to update the best current solution and so on. If you do this as an academic exercise, then I'd recommend thinking in messages only. Separate out the message delivery entirely from your core design. This makes the whole concept much simpler and more generic. Message delivery can be made almost instantaneous in case of threads or to take a few tens of microseconds to several seconds between different physical nodes Which speed is fast enough depends entirely on your query - for a query running 5 hours on single CPU and 5 minutes on a cluster, message delay of 50 ms is entirely acceptable -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Re: [HACKERS] DO ... RETURNING
Hello The current situation is akin to not being able to use queries directly but always requiring you to create a view first and then do select ... from myview ok probably we can allow using DO in CTE without impact on other SQL statements, and for this purpose we need to know returned TupleDescriptor early. so I am able accept it, although I am thinking so we are going in strange direction. We are not able do simply tasks simply (we cannot execute SQL script on server side simply) :(. But it is not problem of Hannu design. other question - can we find some readable and intuitive syntax for DO parametrization? Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
On Tue, Jun 11, 2013 at 09:30:32AM +0200, Pavel Stehule wrote: Hello The current situation is akin to not being able to use queries directly but always requiring you to create a view first and then do select ... from myview ok probably we can allow using DO in CTE without impact on other SQL statements, and for this purpose we need to know returned TupleDescriptor early. so I am able accept it, although I am thinking so we are going in strange direction. We are not able do simply tasks simply (we cannot execute SQL script on server side simply) :(. But it is not problem of Hannu design. other question - can we find some readable and intuitive syntax for DO parametrization? The thing I'm thinking of where this does something we don't yet do is in the realm of access control, e.g. allowing people to use DO rather than giving them DDL permission to create temporary functions. Is this what you have in mind? What other things? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Batch API for After Triggers
On 10 June 2013 22:50, Robert Haas robertmh...@gmail.com wrote: On Sat, Jun 8, 2013 at 5:00 PM, Simon Riggs si...@2ndquadrant.com wrote: While fiddling with FK tuning, Noah suggested batching trigger executions together to avoid execution overhead. It turns out there is no easy way to write triggers that can take advantage of the knowledge that they are being executed as a set of trigger executions. Some API is required to allow a trigger to understand that there may be other related trigger executions in the very near future, so it can attempt to amortise call overhead across many invocations (batching). The attached patch adds two fields to the TriggerDesc trigger functions are handed, allowing them to inspect (if they choose) the additional fields and thus potentially use some form of batching. I'm unclear how this could be used in practice. As described, you can use it to prepare batches. Are the events in a batch guaranteed to, say, all be related to the same relation? Good point. I was too focused on the single large statements I was considering. There would need to some logic to look at relations as well. For individual statements it can work well, since all or at least the majority of events are for a particular relation and we can know which one that is. It probably wouldn't work that well for deferred trigger events that covered a spread of different relations. Perhaps that's an argument for a final function after all. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
2013/6/11 David Fetter da...@fetter.org: On Tue, Jun 11, 2013 at 09:30:32AM +0200, Pavel Stehule wrote: Hello The current situation is akin to not being able to use queries directly but always requiring you to create a view first and then do select ... from myview ok probably we can allow using DO in CTE without impact on other SQL statements, and for this purpose we need to know returned TupleDescriptor early. so I am able accept it, although I am thinking so we are going in strange direction. We are not able do simply tasks simply (we cannot execute SQL script on server side simply) :(. But it is not problem of Hannu design. other question - can we find some readable and intuitive syntax for DO parametrization? The thing I'm thinking of where this does something we don't yet do is in the realm of access control, e.g. allowing people to use DO rather than giving them DDL permission to create temporary functions. Is this what you have in mind? What other things? no - I don't see any difference if we allow temporary functions or DO statement. I am missing some simple tool for scripting any administration tasks - cleaning schema, cleaning databases, user management, report calculation. For this a some procedural functionality is necessary. Now I use a combination bash/psql/SQL and PL/pgSQL functions. It works, it works well, but it is hard for leaning, hard for maintaining. I had a idea - procedural enhancing of psql. When I work on prototype I leave this idea. Probably we can implement some like C macros to psql, but it is a end. So my second idea is a plpgsql server side scripting. PL/pgSQL is readable and strong - just we need to eval our procedural code outside a query. For these tasks a performance is not top - because there hard bottleneck is IO. And slow combination Bash/psql worked without performance problems. A server side scripting can be consistent - probably all PL languages should be used and enough for typical administrative tasks. Regards Pavel Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] erroneous restore into pg_catalog schema
Stephen Frost sfr...@snowman.net writes: What happens with the default settings when you try to install two extensions that have overlapping function signatures..? I can't imagine it 'just works'.. And then what? Is there a way that an admin can set up search paths for individual users which provide the 'right' function and work even when the user decides to change their search_path? That entirely depends on how the extension script is written. Making it possible to have two versions concurrently installed require a non trivial amount of efforts, but I don't think the extension facility gets in the way at all, currently. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] erroneous restore into pg_catalog schema
On 2013-06-11 10:33:29 +0200, Dimitri Fontaine wrote: That entirely depends on how the extension script is written. Making it possible to have two versions concurrently installed require a non trivial amount of efforts, but I don't think the extension facility gets in the way at all, currently. It does. We only allow an extension to be installed once, irregardless of schema... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: It's legal, is it not, to just write the equivalent Unicode character in the JSON string and not use the escapes? If so I would think that that would be the most common usage. If someone's writing an escape, they probably had a reason for doing it that way, and might not appreciate our overriding their decision. We never store the converted values in the JSON object, nor do we return them from functions that return JSON. But many of the functions and operators that process the JSON have variants that return text instead of JSON, and in those cases, when the value returned is a JSON string, we do the following to it: I have just realized that the problem is actually quite a lot bigger than that. We also use this value for field name comparison. So, let us suppose that we have a LATIN1 database and a piece of JSON with a field name containing the Euro sign (\u20ac), a character that is not in LATIN1. Making that processable so it doesn't blow up would be mighty tricky and error prone. The non-orthogonality I suggested as a solution upthread is, by contrast, very small and easy to manage, and not terribly hard to explain - see attached. I think this all shows pretty clearly that it was a mistake allowing json data in the database that we cannot entirely display with the database's encoding. All the proposed ugly workarounds are only necessary because we don't throw an error when originally validating the json. Even in an utf-8 database you can get errors due to \u unescaping (at attribute access time, *NOT* at json_in() time) due to invalidate surrogate pairs. I think this goes countrary to normal postgres approach of validating data as strict as necessary. And I think we are going to regret not fixing this while there are still relatively few users out there. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
Hi, That topic apparently raises each year and rehash the same points. Pavel Stehule pavel.steh...@gmail.com writes: probably we can allow using DO in CTE without impact on other SQL statements, and for this purpose we need to know returned TupleDescriptor early. I still think that DO being a utility statement, having it take parameters and return data is going to be a wart in a part of the system that has only too many of them already. My thinking revolves around CTE support for functions: WITH FUNCTION name(param, ...) RETURNS type LANGUAGE plpgsql AS ( $$ function body here $$ ) SELECT name(x, ...) FROM ...; so I am able accept it, although I am thinking so we are going in strange direction. We are not able do simply tasks simply (we cannot execute SQL script on server side simply) :(. But it is not problem of Hannu design. With the DO utility command you can already execute SQL script on the server quite simply. After all your proposals it's still unclear to me where you want to process which data? (I admit this time I didn't pay much attention, sorry about that) other question - can we find some readable and intuitive syntax for DO parametrization? See above. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
2013/6/11 Dimitri Fontaine dimi...@2ndquadrant.fr: Hi, That topic apparently raises each year and rehash the same points. Pavel Stehule pavel.steh...@gmail.com writes: probably we can allow using DO in CTE without impact on other SQL statements, and for this purpose we need to know returned TupleDescriptor early. I still think that DO being a utility statement, having it take parameters and return data is going to be a wart in a part of the system that has only too many of them already. My thinking revolves around CTE support for functions: WITH FUNCTION name(param, ...) RETURNS type LANGUAGE plpgsql AS ( $$ function body here $$ ) SELECT name(x, ...) FROM ...; so I am able accept it, although I am thinking so we are going in strange direction. We are not able do simply tasks simply (we cannot execute SQL script on server side simply) :(. But it is not problem of Hannu design. With the DO utility command you can already execute SQL script on the server quite simply. After all your proposals it's still unclear to me where you want to process which data? (I admit this time I didn't pay much attention, sorry about that) there are a significant limit - you cannot simply change a database when you collect statistics over databases, you cannot drop database ... you cannot return multiple returns sets - show info about tables, schemas, indexes in one call what I would DO $$ BEGIN FOR r IN pg_databases LOOP CONNECT r.dbname; FOR table_name IN SELECT * FROM pg_class ... WHERE owner = 'GD' AND table_name LIKE 'xx%' LOOP IF pg_relsize(table_name) xxx AND ... THEN -- show info about dropped table SELECT xx FROM pg_class, pg_attribute --- SHOW STRUCTURE OF ANY INTERESTING TABLE -- multirecordset output EXECUTE FORMAT('DROP TABLE %I', table_name); ... Regards Pavel other question - can we find some readable and intuitive syntax for DO parametrization? See above. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
Pavel Stehule pavel.steh...@gmail.com writes: FOR r IN pg_databases LOOP CONNECT r.dbname; Do you mean that you want to run this DO block on the client side? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
On 06/11/2013 11:30 AM, Pavel Stehule wrote: 2013/6/11 Dimitri Fontaine dimi...@2ndquadrant.fr: Hi, That topic apparently raises each year and rehash the same points. Pavel Stehule pavel.steh...@gmail.com writes: probably we can allow using DO in CTE without impact on other SQL statements, and for this purpose we need to know returned TupleDescriptor early. I still think that DO being a utility statement, having it take parameters and return data is going to be a wart in a part of the system that has only too many of them already. My thinking revolves around CTE support for functions: WITH FUNCTION name(param, ...) RETURNS type LANGUAGE plpgsql AS ( $$ function body here $$ ) SELECT name(x, ...) FROM ...; Yes, this would be another way to do in-line pl functions. I do not think that DO ... RETURNING and WITH FUNCTION are mutually exclusive. Rather effort put into implementing one would also be useful for the other. so I am able accept it, although I am thinking so we are going in strange direction. We are not able do simply tasks simply (we cannot execute SQL script on server side simply) :(. But it is not problem of Hannu design. With the DO utility command you can already execute SQL script on the server quite simply. After all your proposals it's still unclear to me where you want to process which data? (I admit this time I didn't pay much attention, sorry about that) there are a significant limit - you cannot simply change a database when you collect statistics over databases, you cannot drop database I can do this easily in pl/python. Actually this is how statistics are collected in https://github.com/postsql/pgmon_zabbix You can also do it using pl/proxy or in plpgsql using dblink. I have seen quite complicated data analysing utilities - with process control running 1 to N backends depending on load -written in pl/pgsql using dblink. ... you cannot return multiple returns sets Inability to return multiple result sets from a query is something which was introduced at some point after/during the move from PostQuel to SQL. It is still there at server side at protocol level, but no client I know of supports it, and as far as I know it is also hard to generate on server -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 10:47 AM, Andres Freund wrote: On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: It's legal, is it not, to just write the equivalent Unicode character in the JSON string and not use the escapes? If so I would think that that would be the most common usage. If someone's writing an escape, they probably had a reason for doing it that way, and might not appreciate our overriding their decision. We never store the converted values in the JSON object, nor do we return them from functions that return JSON. But many of the functions and operators that process the JSON have variants that return text instead of JSON, and in those cases, when the value returned is a JSON string, we do the following to it: I have just realized that the problem is actually quite a lot bigger than that. We also use this value for field name comparison. So, let us suppose that we have a LATIN1 database and a piece of JSON with a field name containing the Euro sign (\u20ac), a character that is not in LATIN1. Making that processable so it doesn't blow up would be mighty tricky and error prone. The non-orthogonality I suggested as a solution upthread is, by contrast, very small and easy to manage, and not terribly hard to explain - see attached. I think this all shows pretty clearly that it was a mistake allowing json data in the database that we cannot entirely display with the database's encoding. All the proposed ugly workarounds are only necessary because we don't throw an error when originally validating the json. Even in an utf-8 database you can get errors due to \u unescaping (at attribute access time, *NOT* at json_in() time) due to invalidate surrogate pairs. I think this goes countrary to normal postgres approach of validating data as strict as necessary. And I think we are going to regret not fixing this while there are still relatively few users out there. Exactly - * allow in only valid JSON. * Validate all utf8 strings for valid unicode. * have one canonic way of outputting unicode - utf8 for utf8 databases, escaped for all other encodings * If you need to store anything else, use text. Requiring preserving original text in json data field is Not Good! I fully expect '{a:1, a:none, a:true, a:b}'::json to come out as '{a:b}' (I know that currently this is noty true and will happen only once I read in the json value in client) For anything else - don't use json, use any text type If you really need a simple text-validated-as-valid-json-input datatype then add this. Call it jsontext or somesuch :) -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
2013/6/11 Dimitri Fontaine dimi...@2ndquadrant.fr: Pavel Stehule pavel.steh...@gmail.com writes: FOR r IN pg_databases LOOP CONNECT r.dbname; Do you mean that you want to run this DO block on the client side? no, really no. I am thinking about some outer server side process, where these scripts will be executed. Maybe other usage for background worker process Pavel -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Current CLOBBER_CACHE_ALWAYS failures
The CLOBBER_CACHE_ALWAYS animals (friarbird and jaguarundi) have been failing persistently for about 36 hours now. The error is in a test added by Tom's recent commit a4424c5: Expected: -- Check row comparisons with IN select * from int8_tbl i8 where i8 in (row(123,456)); -- fail, type mismatch ERROR: cannot compare dissimilar column types bigint and integer at record column 1 Actual: select * from int8_tbl i8 where i8 in (row(123,456)); -- fail, type mismatch connection to server was lost Backtrace, as good as I can make it: (gdb) bt #0 0x00743fcf in cache_record_field_properties (typentry=0x8031c06f8) at typcache.c:658 #1 0x00744053 in record_fields_have_compare (typentry=0x8031c06f8) at typcache.c:625 #2 0x00743983 in lookup_type_cache (type_id=16446, flags=8) at typcache.c:375 #3 0x0074287c in op_mergejoinable (opno=Variable opno is not available. ) at lsyscache.c:1188 #4 0x00605b0c in check_mergejoinable (restrictinfo=0x803064e18) at initsplan.c:1876 #5 0x006062d8 in distribute_qual_to_rels (root=0x8031ce4e0, clause=0x8031cea48, is_deduced=Variable is_deduced is not available. ) at initsplan.c:1364 #6 0x00606aa4 in deconstruct_recurse (root=0x8031ce4e0, jtnode=0x803063d30, below_outer_join=0 '\0', qualscope=0x7fffbbb0, inner_join_rels=0x7fffbba8) at initsplan.c:645 #7 0x00607429 in deconstruct_jointree (root=Variable root is not available. ) at initsplan.c:547 #8 0x00608554 in query_planner (root=0x8031ce4e0, tlist=0x8031ce470, tuple_fraction=0, limit_tuples=-1, qp_callback=0x60c570 standard_qp_callback, qp_extra=0x7fffbe50, cheapest_path=0x7fffbe90, sorted_path=0x7fffbe88, num_groups=0x7fffbea0) at planmain.c:176 #9 0x00609edc in grouping_planner (root=0x8031ce4e0, tuple_fraction=0) at planner.c:1203 #10 0x0060c014 in subquery_planner (glob=0x803064f30, parse=0x803054338, parent_root=0x803054450, hasRecursion=Variable hasRecursion is not available. ) at planner.c:558 #11 0x0060c318 in standard_planner (parse=0x803054338, cursorOptions=0, boundParams=0x0) at planner.c:209 #12 0x0067ce0d in pg_plan_query (querytree=0x803054338, cursorOptions=0, boundParams=0x0) at postgres.c:753 #13 0x0067ceee in pg_plan_queries (querytrees=Variable querytrees is not available. ) at postgres.c:812 #14 0x0067d1ca in exec_simple_query ( query_string=0x803053038 select * from int8_tbl i8 where i8 in (row(123,456));) at postgres.c:977 #15 0x0067e88a in PostgresMain (argc=1, argv=Variable argv is not available. ) at postgres.c:3985 #16 0x0063137c in ServerLoop () at postmaster.c:3987 #17 0x00634702 in PostmasterMain (argc=6, argv=0x7fffd728) at postmaster.c:1246 #18 0x005ca98f in main (argc=6, argv=0x7fffd728) at main.c:196 (gdb) l 653 TCFLAGS_HAVE_FIELD_COMPARE); 654 for (i = 0; i tupdesc-natts; i++) 655 { 656 TypeCacheEntry *fieldentry; 657 658 if (tupdesc-attrs[i]-attisdropped) 659 continue; 660 661 fieldentry = lookup_type_cache(tupdesc-attrs[i]-atttypid, 662 TYPECACHE_EQ_OPR | (gdb) p i $5 = 1 (gdb) p typentry $6 = (TypeCacheEntry *) 0x8031c06f8 (gdb) p typentry-tupDesc $7 = 0x0 (gdb) p *typentry $8 = {type_id = 16446, typlen = -1, typbyval = 0 '\0', typalign = 100 'd', typstorage = 120 'x', typtype = 99 'c', typrelid = 16429, btree_opf = 2994, btree_opintype = 2249, hash_opf = 0, hash_opintype = 0, eq_opr = 0, lt_opr = 0, gt_opr = 0, cmp_proc = 0, hash_proc = 0, eq_opr_finfo = { fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_stats = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, cmp_proc_finfo = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_stats = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, hash_proc_finfo = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_stats = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, tupDesc = 0x0, rngelemtype = 0x0, rng_collation = 0, rng_cmp_proc_finfo = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_stats = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, rng_canonical_finfo = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_stats = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, rng_subdiff_finfo = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_stats = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, flags = 0, enumData = 0x0} type_id 16446
Re: [HACKERS] erroneous restore into pg_catalog schema
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote: Stephen Frost sfr...@snowman.net writes: What happens with the default settings when you try to install two extensions that have overlapping function signatures..? I can't imagine it 'just works'.. And then what? Is there a way that an admin can set up search paths for individual users which provide the 'right' function and work even when the user decides to change their search_path? That entirely depends on how the extension script is written. Making it possible to have two versions concurrently installed require a non trivial amount of efforts, but I don't think the extension facility gets in the way at all, currently. How would you recommend writing an extension script which deals with conflicts? Also, as Andres points out, the current extension system doesn't allow installing multiple versions. It'd be kind of nice if it did, but there's problems in that direction. Extension authors can manage that issue by having differently named extensions (where the name includes some number); similar to libraries. That isn't the only case where name conflicts can and will occur between extensions though, which is the more general issue that I was pointing out. If there's no knowledge between the extension authors of the other extension (which is likey the case..) then chances are that such a conflict will cause either a failure or incorrect behavior. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Server side lo-funcs name
On Tue, Jun 11, 2013 at 1:32 AM, Tatsuo Ishii is...@postgresql.org wrote: Recently we got a complain about server side large object function names described in the doc: http://www.postgresql.org/message-id/51b2413f.8010...@gmail.com In the doc: http://www.postgresql.org/docs/9.3/static/lo-funcs.html There are server-side functions callable from SQL that correspond to each of the client-side functions described above; indeed, for the most part the client-side functions are simply interfaces to the equivalent server-side functions From the description it is hard for users to find out server side functions loread and lowrite becuase they are looking for lo_read and lo_write. So I think his complain is fair. Included patches attempt to fix the problem. I have committed this. Did you see my email with proposed alternative text? You didn't even fix the whitespace error I pointed out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 06:53 AM, Hannu Krosing wrote: On 06/11/2013 10:47 AM, Andres Freund wrote: On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: It's legal, is it not, to just write the equivalent Unicode character in the JSON string and not use the escapes? If so I would think that that would be the most common usage. If someone's writing an escape, they probably had a reason for doing it that way, and might not appreciate our overriding their decision. We never store the converted values in the JSON object, nor do we return them from functions that return JSON. But many of the functions and operators that process the JSON have variants that return text instead of JSON, and in those cases, when the value returned is a JSON string, we do the following to it: I have just realized that the problem is actually quite a lot bigger than that. We also use this value for field name comparison. So, let us suppose that we have a LATIN1 database and a piece of JSON with a field name containing the Euro sign (\u20ac), a character that is not in LATIN1. Making that processable so it doesn't blow up would be mighty tricky and error prone. The non-orthogonality I suggested as a solution upthread is, by contrast, very small and easy to manage, and not terribly hard to explain - see attached. I think this all shows pretty clearly that it was a mistake allowing json data in the database that we cannot entirely display with the database's encoding. All the proposed ugly workarounds are only necessary because we don't throw an error when originally validating the json. Even in an utf-8 database you can get errors due to \u unescaping (at attribute access time, *NOT* at json_in() time) due to invalidate surrogate pairs. I think this goes countrary to normal postgres approach of validating data as strict as necessary. And I think we are going to regret not fixing this while there are still relatively few users out there. Exactly - * allow in only valid JSON. * Validate all utf8 strings for valid unicode. * have one canonic way of outputting unicode - utf8 for utf8 databases, escaped for all other encodings * If you need to store anything else, use text. Requiring preserving original text in json data field is Not Good! I fully expect '{a:1, a:none, a:true, a:b}'::json to come out as '{a:b}' (I know that currently this is noty true and will happen only once I read in the json value in client) For anything else - don't use json, use any text type If you really need a simple text-validated-as-valid-json-input datatype then add this. Call it jsontext or somesuch :) Most of this is irrelevant to solving the current problem. We don't have a greenfields development of JSON - we have an existing type with which we have to work. Note too that your utf8 for utf8 databases, escaped for all other encodings is exactly the opposite of what Noah and Andres and Tom are arguing, namely that the database encoding should not affect the behaviour, and is very close to what I proposed yesterday. It's a pity that we don't have a non-error producing conversion function (or if we do that I haven't found it). Then we might adopt a rule for processing unicode escapes that said convert unicode escapes to the database encoding if possible, and if not then emit them unchanged. which might be a reasonable compromise. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 2013-06-11 12:53 CEST, Hannu Krosing wrote: On 06/11/2013 10:47 AM, Andres Freund wrote: On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: It's legal, is it not, to just write the equivalent Unicode character in the JSON string and not use the escapes? If so I would think that that would be the most common usage. If someone's writing an escape, they probably had a reason for doing it that way, and might not appreciate our overriding their decision. We never store the converted values in the JSON object, nor do we return them from functions that return JSON. But many of the functions and operators that process the JSON have variants that return text instead of JSON, and in those cases, when the value returned is a JSON string, we do the following to it: I have just realized that the problem is actually quite a lot bigger than that. We also use this value for field name comparison. So, let us suppose that we have a LATIN1 database and a piece of JSON with a field name containing the Euro sign (\u20ac), a character that is not in LATIN1. Making that processable so it doesn't blow up would be mighty tricky and error prone. The non-orthogonality I suggested as a solution upthread is, by contrast, very small and easy to manage, and not terribly hard to explain - see attached. I think this all shows pretty clearly that it was a mistake allowing json data in the database that we cannot entirely display with the database's encoding. All the proposed ugly workarounds are only necessary because we don't throw an error when originally validating the json. Even in an utf-8 database you can get errors due to \u unescaping (at attribute access time, *NOT* at json_in() time) due to invalidate surrogate pairs. I think this goes countrary to normal postgres approach of validating data as strict as necessary. And I think we are going to regret not fixing this while there are still relatively few users out there. Exactly - * allow in only valid JSON. * Validate all utf8 strings for valid unicode. * have one canonic way of outputting unicode - utf8 for utf8 databases, escaped for all other encodings * If you need to store anything else, use text. Requiring preserving original text in json data field is Not Good! I fully expect '{a:1, a:none, a:true, a:b}'::json to come out as '{a:b}' ahem, do you mean instead to give (none - null and missing '' inserted in answer): =# SELECT '{a:1, a:null, a:true, a:b}'::json; json -- {a:b} or only when stored in database and subsequently retrieved? The original text in this case was perfectly valid JSON text. (I know that currently this is noty true and will happen only once I read in the json value in client) Isn't this a good situation and doesn't this also depend on the storage representation in the client? What about this: =# SELECT '{measure:seconds, measure:42}'::json; json -- {measure:42} I presume people being used to store metadata in preceding json object members with duplicate names, would want to decide in the client requesting the data what to do with the metadata information and at what point to drop, wouldn't they :-?) For anything else - don't use json, use any text type If you really need a simple text-validated-as-valid-json-input datatype then add this. Call it jsontext or somesuch :) All the best, Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 02:41 PM, Andrew Dunstan wrote: On 06/11/2013 06:53 AM, Hannu Krosing wrote: On 06/11/2013 10:47 AM, Andres Freund wrote: On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: It's legal, is it not, to just write the equivalent Unicode character in the JSON string and not use the escapes? If so I would think that that would be the most common usage. If someone's writing an escape, they probably had a reason for doing it that way, and might not appreciate our overriding their decision. We never store the converted values in the JSON object, nor do we return them from functions that return JSON. But many of the functions and operators that process the JSON have variants that return text instead of JSON, and in those cases, when the value returned is a JSON string, we do the following to it: I have just realized that the problem is actually quite a lot bigger than that. We also use this value for field name comparison. So, let us suppose that we have a LATIN1 database and a piece of JSON with a field name containing the Euro sign (\u20ac), a character that is not in LATIN1. Making that processable so it doesn't blow up would be mighty tricky and error prone. The non-orthogonality I suggested as a solution upthread is, by contrast, very small and easy to manage, and not terribly hard to explain - see attached. I think this all shows pretty clearly that it was a mistake allowing json data in the database that we cannot entirely display with the database's encoding. All the proposed ugly workarounds are only necessary because we don't throw an error when originally validating the json. Even in an utf-8 database you can get errors due to \u unescaping (at attribute access time, *NOT* at json_in() time) due to invalidate surrogate pairs. I think this goes countrary to normal postgres approach of validating data as strict as necessary. And I think we are going to regret not fixing this while there are still relatively few users out there. Exactly - * allow in only valid JSON. * Validate all utf8 strings for valid unicode. * have one canonic way of outputting unicode - utf8 for utf8 databases, escaped for all other encodings * If you need to store anything else, use text. Requiring preserving original text in json data field is Not Good! I fully expect '{a:1, a:none, a:true, a:b}'::json to come out as '{a:b}' (I know that currently this is noty true and will happen only once I read in the json value in client) For anything else - don't use json, use any text type If you really need a simple text-validated-as-valid-json-input datatype then add this. Call it jsontext or somesuch :) Most of this is irrelevant to solving the current problem. We don't have a greenfields development of JSON - we have an existing type with which we have to work. We do have current (postgresql v9.2) json data type, which is loosely defined as a string which can be fed to a JSON deserializer without errors. But then we also have a standard. Sadly the standard is really not about a type, but about serialising a subset of JavaScript structures to a utf8 string. This kind of implies an underlying type which is a structure consisting of a few basic types and lists and dictionaries/classes based on these but does not really specify any type. Note too that your utf8 for utf8 databases, escaped for all other encodings is exactly the opposite of what Noah and Andres and Tom are arguing, namely that the database encoding should not affect the behaviour, It does not affect behaviour. It makes it possible to encode valid JSON (utf8) in client encoding which may not have full set of unicode characters. We can do it here, as we know that by definition JSON _is_ unicode, so clients have to be able to process any unicode charcter. We can not do this for plain text, which has no defined charset outside what client_encoding postulates. and is very close to what I proposed yesterday. It's a pity that we don't have a non-error producing conversion function (or if we do that I haven't found it). Then we might adopt a rule for processing unicode escapes that said convert unicode escapes to the database encoding only when extracting JSON keys or values to text makes it sense to unescape to database encoding. strings inside JSON itself are by definition utf8 if possible, and if not then emit them unchanged. which might be a reasonable compromise. I'd opt for ... and if not then emit them quoted. The default should be not loosing any data. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 03:08 PM, Stefan Drees wrote: quiring preserving original text in json data field is Not Good! I fully expect '{a:1, a:none, a:true, a:b}'::json to come out as '{a:b}' ahem, do you mean instead to give (none - null and missing '' inserted in answer): yeah, mixed up none and null =# SELECT '{a:1, a:null, a:true, a:b}'::json; json -- {a:b} or only when stored in database and subsequently retrieved? The original text in this case was perfectly valid JSON text. (I know that currently this is noty true and will happen only once I read in the json value in client) Isn't this a good situation and doesn't this also depend on the storage representation in the client? What about this: =# SELECT '{measure:seconds, measure:42}'::json; json -- {measure:42} I presume people being used to store metadata in preceding json object members with duplicate names, would want to decide in the client requesting the data what to do with the metadata information and at what point to drop, wouldn't they :-?) Seems like blatant misuse of JSON format :) I assume that as JSON is _serialisation_ format, it should represent a data structure, not processing instructions. I can see no possible JavaScript structure which could produce duplicate key when serialised. And I don't think that any standard JSON reader supports this either. Of you want to store any JavaScript snippets in database use text. Or perhaps pl/v8 :) -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 09:16 AM, Hannu Krosing wrote: It's a pity that we don't have a non-error producing conversion function (or if we do that I haven't found it). Then we might adopt a rule for processing unicode escapes that said convert unicode escapes to the database encoding only when extracting JSON keys or values to text makes it sense to unescape to database encoding. That's exactly the scenario we are talking about. When emitting JSON the functions have always emitted unicode escapes as they are in the text, and will continue to do so. strings inside JSON itself are by definition utf8 We have deliberately extended that to allow JSON strings to be in any database server encoding. That was argued back in the 9.2 timeframe and I am not interested in re-litigating it. The only issue at hand is how to handle unicode escapes (which in their string form are pure ASCII) when emitting text strings. if possible, and if not then emit them unchanged. which might be a reasonable compromise. I'd opt for ... and if not then emit them quoted. The default should be not loosing any data. I don't know what this means at all. Quoted how? Let's say I have a Latin1 database and have the following JSON string: \u20AC2.00. In a UTF8 database the text representation of this is €2.00 - what are you saying it should be in the Latin1 database? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to find out whether a view is updatable
Dean Rasheed dean.a.rash...@gmail.com writes: On 11 June 2013 01:03, Michael Paquier michael.paqu...@gmail.com wrote: Yes this is definitely material for 9.4. You should add this patch to the Thanks. Arguably though, the API changes are something that should be sorted out in 9.3, I agree --- I'm planning to look at this in the next few days. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 09:23 AM, Hannu Krosing wrote: I can see no possible JavaScript structure which could produce duplicate key when serialised. And I don't think that any standard JSON reader supports this either. You are quite wrong. This was discussed quite recently on -hackers, too. V8 will happily accept a JSON string with duplicate keys via eval() and resolve it in favor of the lexically latest value. I gather most other JSON processors do likewise. Can we PLEASE PLEASE stop sending this discussion off track and concentrate on the actual problem we have at hand? It's BETA and there is not much time. I get that you don't like how we have implemented JSON. But we're not going back over that ground now. It's done and in use and LOTS of people are finding it very useful. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 03:42 PM, Andrew Dunstan wrote: On 06/11/2013 09:16 AM, Hannu Krosing wrote: It's a pity that we don't have a non-error producing conversion function (or if we do that I haven't found it). Then we might adopt a rule for processing unicode escapes that said convert unicode escapes to the database encoding only when extracting JSON keys or values to text makes it sense to unescape to database encoding. That's exactly the scenario we are talking about. When emitting JSON the functions have always emitted unicode escapes as they are in the text, and will continue to do so. strings inside JSON itself are by definition utf8 We have deliberately extended that to allow JSON strings to be in any database server encoding. Ugh! Does that imply that we just not allow it, but rather require it ? Why are we arguing the unicode surrogate pairs as a JSON thing then ? Should it not be client to server encoding conversion thing instead ? That was argued back in the 9.2 timeframe and I am not interested in re-litigating it. The only issue at hand is how to handle unicode escapes (which in their string form are pure ASCII) when emitting text strings. Unicode escapes in non-unicode strings seem something that is ill-defined by nature ;) That is, you can't come up with a good general answer for this. if possible, and if not then emit them unchanged. which might be a reasonable compromise. I'd opt for ... and if not then emit them quoted. The default should be not loosing any data. I don't know what this means at all. Quoted how? Let's say I have a Latin1 database and have the following JSON string: \u20AC2.00. In a UTF8 database the text representation of this is €2.00 - what are you saying it should be in the Latin1 database? utf8-quote the '€' - \u20AC2.00 That is, convert unicode--Latin1 what has a correspondence, utf8-quote anything that does not. If we allow unicode escapes in non-unicode strings anyway, then this seems the most logical thing to do. cheers andrew -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 2013-06-11 15:23 CEST, Hannu Krosing wrote: On 06/11/2013 03:08 PM, Stefan Drees wrote: ... What about this: =# SELECT '{measure:seconds, measure:42}'::json; json -- {measure:42} I presume people being used to store metadata in preceding json object members with duplicate names, would want to decide in the client requesting the data what to do with the metadata information and at what point to drop, wouldn't they :-?) Seems like blatant misuse of JSON format :) I assume that as JSON is _serialisation_ format, it should represent a data structure, not processing instructions. I can see no possible JavaScript structure which could produce duplicate key when serialised. ahem, JSON is a notation that allows toplevel an object or an array. If it is an object, this consists of pairs called (name, value). Here value can be any object, array, number, string or the literals null, false or true. The name must be a string. That's it :-) no key **and** also no ordering on these names ;-) and as the RFC does not care, where the data came from or how it was represented before it became JSON text (the top-level element of a JSON document) how should the parser know ... but delta notaion, commenting, or streaming needs created many applications that deliver multibags and trust on some ordering conventions in their dataexchanging relations. And I don't think that any standard JSON reader supports this either. Oh yes. Convention is merely: Keep all (Streaming) or the last (whatever the last may mean, must be carefully ensured in the interchange relation). All would like these two scenarios, but the RFC as is does not prevent an early-out (like INSERT OR IGNORE) :-)) Of you want to store any JavaScript snippets in database use text. JSON is language agnostic. I use more JSON from python, php than from js, but others do so differently ... Or perhaps pl/v8 :) Do you mean the V8 Engine Javascript Procedural Language add-on for PostgreSQL (http://code.google.com/p/plv8js/), I guess so. I did not want to hijack the thread, as this centered more around escaping where and what in which context (DB vs. client encoding). As the freshly created IETF json working group revamps the JSON RFC on its way to the standards track, there are currently also discussions on what to do with unicode surrogate pairs. See eg. this thread http://www.ietf.org/mail-archive/web/json/current/msg00675.html starting a summarizing effort. Just in case it helps making the fresh JSON feature of PostgreSQL bright, shining and future proof :-) Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server side lo-funcs name
On Tue, Jun 11, 2013 at 1:32 AM, Tatsuo Ishii is...@postgresql.org wrote: Recently we got a complain about server side large object function names described in the doc: http://www.postgresql.org/message-id/51b2413f.8010...@gmail.com In the doc: http://www.postgresql.org/docs/9.3/static/lo-funcs.html There are server-side functions callable from SQL that correspond to each of the client-side functions described above; indeed, for the most part the client-side functions are simply interfaces to the equivalent server-side functions From the description it is hard for users to find out server side functions loread and lowrite becuase they are looking for lo_read and lo_write. So I think his complain is fair. Included patches attempt to fix the problem. I have committed this. Did you see my email with proposed alternative text? You didn't even fix the whitespace error I pointed out. I don't know why but I haven't received your email. I just found the email in the web archive and it looks better than what I proposed. Do you want to commit it yourself? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 03:54 PM, Andrew Dunstan wrote: On 06/11/2013 09:23 AM, Hannu Krosing wrote: I can see no possible JavaScript structure which could produce duplicate key when serialised. And I don't think that any standard JSON reader supports this either. You are quite wrong. This was discussed quite recently on -hackers, too. V8 will happily accept a JSON string with duplicate keys via eval() and resolve it in favor of the lexically latest value. This is what I mean. It is a valid _input_ value , but no existing JavaScript structure serializes to it. In other words - I want us to have in minds some underlying structure, not the text hannu=# select '1e0'::float; float8 1 (1 row) we are not preserving '1e0' in floats, why should we preserve it in json () ? We do imply an internal structured format in several functions operating on json even though we store it in text . I gather most other JSON processors do likewise. Can we PLEASE PLEASE stop sending this discussion off track and concentrate on the actual problem we have at hand? It's BETA and there is not much time. I get that you don't like how we have implemented JSON. The current implementation is a reasonably good compromise, so I can't say I don't like it :) I am here going from the premise that at some point we might implement a json-like binary structured type. If it is left separate from json, I am ok with all kinds of quirks coming from current vague definition of what a json type is. OTOH, if the idea is to move json storage format to this binary structured type, we should resolve possible incompatibilities as early as possible and start thinking of the current as text storage in terms of something that is a real structured type, just stored in text format. This could also mean converting to canonical format on input. But we're not going back over that ground now. It's done and in use and LOTS of people are finding it very useful. I am one of these people who finds it very useful. I just want to avoid painting us in a corner too early. I would like us to have *one* json type in the future, not separate json input string and json compatible binary structure or somesuch cheers andrew -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 04:04 PM, Stefan Drees wrote: On 2013-06-11 15:23 CEST, Hannu Krosing wrote: On 06/11/2013 03:08 PM, Stefan Drees wrote: ... What about this: =# SELECT '{measure:seconds, measure:42}'::json; json -- {measure:42} I presume people being used to store metadata in preceding json object members with duplicate names, would want to decide in the client requesting the data what to do with the metadata information and at what point to drop, wouldn't they :-?) Seems like blatant misuse of JSON format :) I assume that as JSON is _serialisation_ format, it should represent a data structure, not processing instructions. I can see no possible JavaScript structure which could produce duplicate key when serialised. ahem, JSON is a notation that allows toplevel an object or an array. If it is an object, this consists of pairs called (name, value). Here value can be any object, array, number, string or the literals null, false or true. The name must be a string. That's it :-) no key **and** also no ordering on these names ;-) and as the RFC does not care, where the data came from or how it was represented before it became JSON text (the top-level element of a JSON document) how should the parser know ... but delta notaion, commenting, or streaming needs created many applications that deliver multibags and trust on some ordering conventions in their dataexchanging relations. And I don't think that any standard JSON reader supports this either. Oh yes. Convention is merely: Keep all (Streaming) or the last (whatever the last may mean, must be carefully ensured in the interchange relation). All would like these two scenarios, but the RFC as is does not prevent an early-out (like INSERT OR IGNORE) :-)) I was kind of assuming that JSON is a (JavaScript) Object Serialization Notation, that is, there is a unique implied JavaScript Object which can be Serialized int any given JSON string. IOW, that if you serialise an object then this is what JSON should be. The fact that most JSON to Object readers support multiple keys is just an implementation artifact and not something that is required by RFC. Of you want to store any JavaScript snippets in database use text. JSON is language agnostic. I use more JSON from python, php than from js, but others do so differently ... Agreed. Even the fact that you can define any operations on a JSON string - like extracting a value for key - is actually non-standard :) Perhaps I should stop thinking of json type as something that implies any underlying structure ... Or perhaps pl/v8 :) Do you mean the V8 Engine Javascript Procedural Language add-on for PostgreSQL (http://code.google.com/p/plv8js/), I guess so. I did not want to hijack the thread, as this centered more around escaping where and what in which context (DB vs. client encoding). As the freshly created IETF json working group revamps the JSON RFC on its way to the standards track, there are currently also discussions on what to do with unicode surrogate pairs. See eg. this thread http://www.ietf.org/mail-archive/web/json/current/msg00675.html starting a summarizing effort. Wow. The rabbit hole is much deeper than I thought :) -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
On Tue, Jun 11, 2013 at 6:07 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2013/6/11 Dimitri Fontaine dimi...@2ndquadrant.fr: Pavel Stehule pavel.steh...@gmail.com writes: FOR r IN pg_databases LOOP CONNECT r.dbname; Do you mean that you want to run this DO block on the client side? no, really no. I am thinking about some outer server side process, where these scripts will be executed. Maybe other usage for background worker process + 1 I agree with all your comments pretty much down the line. Need top level CALL that supports parameterization and multiple sets that utilizes background worker (we have example spi worker that gives some hints about how pl/pgsql could be made to work). Because it's top level (can't even be inlined to CTE), we can access behaviors that are not possible in current pl/pgsql, for example setting transaction isolation in advance of snapshot and changing database connection mid-procedure. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
* Merlin Moncure (mmonc...@gmail.com) wrote: I agree with all your comments pretty much down the line. Need top level CALL that supports parameterization and multiple sets that utilizes background worker (we have example spi worker that gives some hints about how pl/pgsql could be made to work). Because it's top level (can't even be inlined to CTE), we can access behaviors that are not possible in current pl/pgsql, for example setting transaction isolation in advance of snapshot and changing database connection mid-procedure. And this still has next-to-nothing to do with the specific proposal that was put forward. I'd like actual procedures too, but it's a completely different and distinct thing from making DO blocks able to return something. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallell Optimizer
On 11 June 2013 01:45, Tatsuo Ishii is...@postgresql.org wrote: On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs si...@2ndquadrant.com wrote: On 7 June 2013 20:23, Tom Lane t...@sss.pgh.pa.us wrote: As for other databases, I suspect that ones that have parallel execution are probably doing it with a thread model not a process model. Separate processes are more common because it covers the general case where query execution is spread across multiple nodes. Threads don't work across nodes and parallel queries predate (working) threading models. Indeed. Parallelism based on processes would be more convenient for master-master type of applications. Even if no master-master feature is implemented directly in core, at least a parallelism infrastructure based on processes could be used for this purpose. As long as true synchronous replication is not implemented in core, I am not sure there's a value for parallel execution spreading across multile nodes because of the delay of data update propagation. Please explain what you mean by the word true used here. In another word, eager replication. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallell Optimizer
On Tue, Jun 11, 2013 at 9:45 AM, Tatsuo Ishii is...@postgresql.org wrote: On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs si...@2ndquadrant.com wrote: On 7 June 2013 20:23, Tom Lane t...@sss.pgh.pa.us wrote: As for other databases, I suspect that ones that have parallel execution are probably doing it with a thread model not a process model. Separate processes are more common because it covers the general case where query execution is spread across multiple nodes. Threads don't work across nodes and parallel queries predate (working) threading models. Indeed. Parallelism based on processes would be more convenient for master-master type of applications. Even if no master-master feature is implemented directly in core, at least a parallelism infrastructure based on processes could be used for this purpose. As long as true synchronous replication is not implemented in core, I am not sure there's a value for parallel execution spreading across multile nodes because of the delay of data update propagation. True, but we cannot drop the possibility to have such features in the future either, so a process-based model is safer regarding the possible range of features and applications we could gain with. I wonder why true synchronous replication nor eager replication are not in the developer TODO list. If we want them in the future, they should be on it. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
2013/6/11 Stephen Frost sfr...@snowman.net: * Merlin Moncure (mmonc...@gmail.com) wrote: I agree with all your comments pretty much down the line. Need top level CALL that supports parameterization and multiple sets that utilizes background worker (we have example spi worker that gives some hints about how pl/pgsql could be made to work). Because it's top level (can't even be inlined to CTE), we can access behaviors that are not possible in current pl/pgsql, for example setting transaction isolation in advance of snapshot and changing database connection mid-procedure. And this still has next-to-nothing to do with the specific proposal that was put forward. I'd like actual procedures too, but it's a completely different and distinct thing from making DO blocks able to return something. I think so it is related - we talk about future form of DO statement - or about future form of server side scripting. But it is not important in this moment Pavel Thanks, Stephen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
2013/6/11 Pavel Stehule pavel.steh...@gmail.com: 2013/6/11 Stephen Frost sfr...@snowman.net: * Merlin Moncure (mmonc...@gmail.com) wrote: I agree with all your comments pretty much down the line. Need top level CALL that supports parameterization and multiple sets that utilizes background worker (we have example spi worker that gives some hints about how pl/pgsql could be made to work). Because it's top level (can't even be inlined to CTE), we can access behaviors that are not possible in current pl/pgsql, for example setting transaction isolation in advance of snapshot and changing database connection mid-procedure. And this still has next-to-nothing to do with the specific proposal that was put forward. I'd like actual procedures too, but it's a completely different and distinct thing from making DO blocks able to return something. I think so it is related - we talk about future form of DO statement - or about future form of server side scripting. But it is not important in this moment I wrote, so I can live with Hannu proposal. Regards Pavel Thanks, Stephen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
On Tue, Jun 11, 2013 at 9:45 AM, Stephen Frost sfr...@snowman.net wrote: * Merlin Moncure (mmonc...@gmail.com) wrote: I agree with all your comments pretty much down the line. Need top level CALL that supports parameterization and multiple sets that utilizes background worker (we have example spi worker that gives some hints about how pl/pgsql could be made to work). Because it's top level (can't even be inlined to CTE), we can access behaviors that are not possible in current pl/pgsql, for example setting transaction isolation in advance of snapshot and changing database connection mid-procedure. And this still has next-to-nothing to do with the specific proposal that was put forward. It's a complete feature but completely relevant to the discussion -- the behaviors have a lot of overlap and CALL is in the standard whereas the ad hoc feature DO isn't. This comes up in other feature requests too, like psql bash-like scripting features. That said, it would be pretty cool if you could inline DO into a CTE or more generally into a query (is that possible?) -- then you'd have something distinct. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
There hasn't been much activity here recently. I'm curious, then, if there are questions that I can answer. It may be useful to summarize some things here: - the purpose of the patch is to use posix_fallocate when creating new WAL files, because it's (usually) much quicker - using posix_fallocate is *one* system call versus 2048 calls to write(2) - additionally, using posix_fallocate /guarantees/ that the filesystem has space for the WAL file (by spec) - reportedly (difficult to test or prove), using posix_fallocate *may* reduce file fragmentation - the (limited) testing I've done bears this out: the more new WAL file creation there is, the more the improvement. Once the number of WAL files reaches a constant point, there does not appear to be either a positive or a negative performance impact. This is as expected. - a test program (C) was also written and used which creates, allocates, and then writes to files as fast as possible. This test program also shows the expected performance benefits. - the performance benefits range from a few percent up to about 15 percent Concerns: - some were concerned that the spec makes no claims about posix_fallocate being able to guarantee that the space allocated has zeroes in it. This was discussed here and on the Linux Kernel mailing list, wherein the expected behavior is that it does provide zeroes - most systems don't allocate a great many new WAL files, so the performance benefit is small - your concern here Benefits: - new WAL file allocate is much quicker, more efficient (fewer system calls) - the patch is (reportedly - I'm not a good judge here!) quite small -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
On 6/10/13 6:02 PM, Fabien COELHO wrote: - the tps is global, with a mutex to share the global stochastic process - there is an adaptation for the fork emulation - I do not know wheter this works with Win32 pthread stuff. Instead of this complexity, can we just split the TPS input per client? That's all I was thinking of here, not adding a new set of threading issues. If 1 TPS is requested and there's 10 clients, just set the delay so that each of them targets 1000 TPS. I'm guessing it's more accurate to have them all communicate as you've done here, but it seems like a whole class of new bugs and potential bottlenecks could come out of that. Whenever someone touches the threading model for pgbench it usually gives a stack of build farm headaches. Better to avoid those unless there's really a compelling reason to go through that. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
* Pavel Stehule (pavel.steh...@gmail.com) wrote: 2013/6/11 Stephen Frost sfr...@snowman.net: And this still has next-to-nothing to do with the specific proposal that was put forward. I'd like actual procedures too, but it's a completely different and distinct thing from making DO blocks able to return something. I think so it is related - we talk about future form of DO statement - or about future form of server side scripting. I don't believe there's any intent to ever have DO used for stored procedures. Not only are stored procedures deserving of their own top-level command (eg: CALL, as has been discussed before..), but I believe they would necessairly have different enough semantics that shoe-horning them into DO would end up breaking backwards compatibility. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On 2013-11.06 17:28, Jon Nelson wrote: There hasn't been much activity here recently. I'm curious, then, if there are questions that I can answer. It may be useful to summarize some things here: - the purpose of the patch is to use posix_fallocate when creating new WAL files, because it's (usually) much quicker - using posix_fallocate is *one* system call versus 2048 calls to write(2) - additionally, using posix_fallocate /guarantees/ that the filesystem has space for the WAL file (by spec) - reportedly (difficult to test or prove), using posix_fallocate *may* reduce file fragmentation - the (limited) testing I've done bears this out: the more new WAL file creation there is, the more the improvement. Once the number of WAL files reaches a constant point, there does not appear to be either a positive or a negative performance impact. This is as expected. - a test program (C) was also written and used which creates, allocates, and then writes to files as fast as possible. This test program also shows the expected performance benefits. - the performance benefits range from a few percent up to about 15 percent tried the test program of the patch at least a bit. Retrieved it from: http://www.postgresql.org/message-id/attachment/29088/test_fallocate.c on an oldish linux box (Kernel 2.6.32, x86_64) following $ gcc -o test_fallocate test_fallocate.c a short $ ./test_fallocate foo 1 1 yields: without posix_fallocate: 1 open/close iterations, 1 rewrite in 26.1993s with posix_fallocate: 1 open/close iterations, 1 rewrite in 13.3299s on another box (Kernel 3.2.0, x86_64) same procedure yields: without posix_fallocate: 1 open/close iterations, 1 rewrite in 19.1972s with posix_fallocate: 1 open/close iterations, 1 rewrite in 9.9280s Note, when trying gcc -O2 test_fallocate.c fails to compile with: In file included from /usr/include/fcntl.h:252:0, from test_fallocate.c:3: In function ‘open’, inlined from ‘main’ at test_fallocate.c:68:16: /usr/include/x86_64-linux-gnu/bits/fcntl2.h:51:24: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT in second argument needs 3 arguments Concerns: - some were concerned that the spec makes no claims about posix_fallocate being able to guarantee that the space allocated has zeroes in it. This was discussed here and on the Linux Kernel mailing list, wherein the expected behavior is that it does provide zeroes - most systems don't allocate a great many new WAL files, so the performance benefit is small - your concern here Benefits: - new WAL file allocate is much quicker, more efficient (fewer system calls) - the patch is (reportedly - I'm not a good judge here!) quite small HTH, Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Tue, Jun 11, 2013 at 11:08 AM, Stefan Drees ste...@drees.name wrote: On 2013-11.06 17:28, Jon Nelson wrote: There hasn't been much activity here recently. I'm curious, then, if there are questions that I can answer. It may be useful to summarize some things here: - the purpose of the patch is to use posix_fallocate when creating new WAL files, because it's (usually) much quicker - using posix_fallocate is *one* system call versus 2048 calls to write(2) - additionally, using posix_fallocate /guarantees/ that the filesystem has space for the WAL file (by spec) - reportedly (difficult to test or prove), using posix_fallocate *may* reduce file fragmentation - the (limited) testing I've done bears this out: the more new WAL file creation there is, the more the improvement. Once the number of WAL files reaches a constant point, there does not appear to be either a positive or a negative performance impact. This is as expected. - a test program (C) was also written and used which creates, allocates, and then writes to files as fast as possible. This test program also shows the expected performance benefits. - the performance benefits range from a few percent up to about 15 percent tried the test program of the patch at least a bit. Retrieved it from: http://www.postgresql.org/message-id/attachment/29088/test_fallocate.c on an oldish linux box (Kernel 2.6.32, x86_64) following $ gcc -o test_fallocate test_fallocate.c a short $ ./test_fallocate foo 1 1 yields: without posix_fallocate: 1 open/close iterations, 1 rewrite in 26.1993s with posix_fallocate: 1 open/close iterations, 1 rewrite in 13.3299s on another box (Kernel 3.2.0, x86_64) same procedure yields: without posix_fallocate: 1 open/close iterations, 1 rewrite in 19.1972s with posix_fallocate: 1 open/close iterations, 1 rewrite in 9.9280s Note, when trying gcc -O2 test_fallocate.c fails to compile with: In file included from /usr/include/fcntl.h:252:0, from test_fallocate.c:3: In function ‘open’, inlined from ‘main’ at test_fallocate.c:68:16: /usr/include/x86_64-linux-gnu/bits/fcntl2.h:51:24: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT in second argument needs 3 arguments It's understood that posix_fallocate is faster at this -- the question on the table is 'does this matter in context of postgres?'. Personally I think this patch should go in regardless -- the concerns made IMNSHO are specious. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
2013/6/11 Stephen Frost sfr...@snowman.net: * Pavel Stehule (pavel.steh...@gmail.com) wrote: 2013/6/11 Stephen Frost sfr...@snowman.net: And this still has next-to-nothing to do with the specific proposal that was put forward. I'd like actual procedures too, but it's a completely different and distinct thing from making DO blocks able to return something. I think so it is related - we talk about future form of DO statement - or about future form of server side scripting. I don't believe there's any intent to ever have DO used for stored procedures. Not only are stored procedures deserving of their own top-level command (eg: CALL, as has been discussed before..), but I believe they would necessairly have different enough semantics that shoe-horning them into DO would end up breaking backwards compatibility. In this moment, DO doesn't support any feature that is in conflict with stored procedure functionality, because it is based on functions, and then it have to have limited functionality Syntax of procedures and functions is relatively well defined CREATE FUNCTION foo(..) SELECT expression contains foo call CREATE PROCEDURE foo(..) --- CALL foo() Now anonymous code block is based on functions, but it can be changed to respect context or usage without lost of compatibility. DO $$ ... $$ -- procedural behave -- just execute server side scripts CTE DO RETURNING $$ ... $$ -- functional behave, functional limits. Pavel Thanks, Stephen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
On Tue, Jun 11, 2013 at 11:00 AM, Stephen Frost sfr...@snowman.net wrote: * Pavel Stehule (pavel.steh...@gmail.com) wrote: 2013/6/11 Stephen Frost sfr...@snowman.net: And this still has next-to-nothing to do with the specific proposal that was put forward. I'd like actual procedures too, but it's a completely different and distinct thing from making DO blocks able to return something. I think so it is related - we talk about future form of DO statement - or about future form of server side scripting. I don't believe there's any intent to ever have DO used for stored procedures. Not only are stored procedures deserving of their own top-level command (eg: CALL, as has been discussed before..), but I believe they would necessairly have different enough semantics that shoe-horning them into DO would end up breaking backwards compatibility. I was not arguing to shoe-horn them into DO, but rather that the proposal is shoe-horning into DO what should be in CALL (but I'm having second thoughts about that -- CALL AFAIK can't do in-line code blocks). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
On Tue, Jun 11, 2013 at 11:26 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2013/6/11 Stephen Frost sfr...@snowman.net: * Pavel Stehule (pavel.steh...@gmail.com) wrote: 2013/6/11 Stephen Frost sfr...@snowman.net: And this still has next-to-nothing to do with the specific proposal that was put forward. I'd like actual procedures too, but it's a completely different and distinct thing from making DO blocks able to return something. I think so it is related - we talk about future form of DO statement - or about future form of server side scripting. I don't believe there's any intent to ever have DO used for stored procedures. Not only are stored procedures deserving of their own top-level command (eg: CALL, as has been discussed before..), but I believe they would necessairly have different enough semantics that shoe-horning them into DO would end up breaking backwards compatibility. In this moment, DO doesn't support any feature that is in conflict with stored procedure functionality, because it is based on functions, and then it have to have limited functionality Syntax of procedures and functions is relatively well defined CREATE FUNCTION foo(..) SELECT expression contains foo call CREATE PROCEDURE foo(..) --- CALL foo() Now anonymous code block is based on functions, but it can be changed to respect context or usage without lost of compatibility. DO $$ ... $$ -- procedural behave -- just execute server side scripts CTE DO RETURNING $$ ... $$ -- functional behave, functional limits. why does it have to be CTE? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
On Tuesday, June 11, 2013, Pavel Stehule wrote: I don't believe there's any intent to ever have DO used for stored procedures. Not only are stored procedures deserving of their own top-level command (eg: CALL, as has been discussed before..), but I believe they would necessairly have different enough semantics that shoe-horning them into DO would end up breaking backwards compatibility. In this moment, DO doesn't support any feature that is in conflict with stored procedure functionality, because it is based on functions, and then it have to have limited functionality Let me re-state: I don't see DO, which is entirely function oriented today, ever being redefined as suddenly being stored procedures instead. Claiming that it wouldn't impact existing users of DO is a level of hand-waving that I just can't follow. If nothing else, it would certainly impact external language support. If we're going to continue to hand-wave at this then I would argue that we'd be able to magically make DO .. RETURNING also happily work in stored procedure mode when the time comes without impacting users. I'm done with this thread. For my part- we can and should extend DO to support RETURNING. DO RETURNING support is a big enough and useful enough addition that it can go in by itself. Attempting to feature-creep it to also cover stored procedures will simply end up killing it unfairly and will not move us forward at all. We should also have stored procedures. Until there's an actual patch for stored procedures which has some real conflict with DO RETURNING, I don't see it as being relevant. I look forward to patches for both. Thanks, Stephen
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
* Merlin Moncure (mmonc...@gmail.com) wrote: It's understood that posix_fallocate is faster at this -- the question on the table is 'does this matter in context of postgres?'. Personally I think this patch should go in regardless -- the concerns made IMNSHO are specious. I've not had a chance to look at this patch, but I tend to agree with Merlin. My main question is really- would this be useful for extending *relations*? Apologies if it's already been discussed; I do plan to go back and read the threads about this more fully, but I wanted to voice my support for using posix_fallocate, when available, in general. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] DO ... RETURNING
On 06/11/2013 05:27 PM, Merlin Moncure wrote: On Tue, Jun 11, 2013 at 9:45 AM, Stephen Frost sfr...@snowman.net wrote: * Merlin Moncure (mmonc...@gmail.com) wrote: I agree with all your comments pretty much down the line. Need top level CALL that supports parameterization and multiple sets that utilizes background worker (we have example spi worker that gives some hints about how pl/pgsql could be made to work). Because it's top level (can't even be inlined to CTE), we can access behaviors that are not possible in current pl/pgsql, for example setting transaction isolation in advance of snapshot and changing database connection mid-procedure. And this still has next-to-nothing to do with the specific proposal that was put forward. It's a complete feature but completely relevant to the discussion -- the behaviors have a lot of overlap and CALL is in the standard whereas the ad hoc feature DO isn't. Could you point to the ISO/ANSI SQL CALL definition ? This comes up in other feature requests too, like psql bash-like scripting features. Client side scripting is distinct from kind-of-server-side multi database scripting. As I said before, it is doable now using pl/pgsql and dblink / plproxy and is trivial in pl/python. That said, it would be pretty cool if you could inline DO into a CTE or more generally into a query (is that possible?) Currently not possible, especially because DO does not return anything at this moment. Again, you can of course run a DO query from any pl (excelp maybe SQL and pl/pgsql) hannu=# DO language plpythonu $$ plpy.execute(DO LANGUAGE plpgsql $x$ BEGIN RAISE NOTICE 'hi!'; END;$x$) $$; NOTICE: hi! CONTEXT: SQL statement DO LANGUAGE plpgsql $x$ BEGIN RAISE NOTICE 'hi!'; END;$x$ PL/Python anonymous code block DO -- then you'd have something distinct. I like the CTE approach better as general inline in query approach and DO ... RETURNING as a way for ad-hoc procs or direct remoting from client. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
On Tue, Jun 11, 2013 at 12:01 PM, Hannu Krosing ha...@2ndquadrant.com wrote: Could you point to the ISO/ANSI SQL CALL definition ? I can't: no one can because the SQL standard is not available online. But you can look at various proxies, for example here: http://farrago.sourceforge.net/design/UserDefinedTypesAndRoutines.html or here: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r897.htm As I said before, it is doable now using pl/pgsql and dblink / plproxy and is trivial in pl/python. That said, it would be pretty cool if you could inline DO into a CTE or more generally into a query (is that possible?) Currently not possible, especially because DO does not return anything at this moment. right, well, I meant hypothetically; I was following along with your thinking and extending inline code blocks to be able to be used in any place where a function is allowed. Specifically I was wondering if there are technical limitations that keep them limited to CTE usage (as is the case with data modifying WITH). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to find out whether a view is updatable
On Tue, Jun 11, 2013 at 4:07 PM, Dean Rasheed dean.a.rash...@gmail.comwrote: Thanks. Arguably though, the API changes are something that should be sorted out in 9.3, but I'm not sure how much of an appetite there is for that, or whether it's too late. I see, OK for the API changes on the functions, but I am not sure it is time to add new options in postgres_fdw as you do in your second patch. Unfortunately I will not be able to have a look in details at your patch, I am sick... -- Michael
Re: [HACKERS] DO ... RETURNING
On 2013-06-11 19:01 CEST, Hannu Krosing wrote: On 06/11/2013 05:27 PM, Merlin Moncure wrote: On Tue, Jun 11, 2013 at 9:45 AM, Stephen Frost ... wrote: * Merlin Moncure ... wrote: I agree with all your comments pretty much down the line. Need top level CALL that supports parameterization and multiple sets that utilizes background worker (we have example spi worker that gives some hints about how pl/pgsql could be made to work). Because it's top level (can't even be inlined to CTE), we can access behaviors that are not possible in current pl/pgsql, for example setting transaction isolation in advance of snapshot and changing database connection mid-procedure. And this still has next-to-nothing to do with the specific proposal that was put forward. It's a complete feature but completely relevant to the discussion -- the behaviors have a lot of overlap and CALL is in the standard whereas the ad hoc feature DO isn't. Could you point to the ISO/ANSI SQL CALL definition ? On the publicly available standards page of ISO: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html searching in it for SQL shows link of ISO/IEC 9075-1:2008: http://standards.iso.org/ittf/PubliclyAvailableStandards/c045498_ISO_IEC_9075-1_2008.zip when you follow and go through: http://standards.iso.org/ittf/licence.html i.e. accept the granted license, you may receive a personal single copy non-distributable PDF version of ISO/IEC 9075-1:2008, 3rd Edition, Information technology -- Database languages -- SQL -- Part 1: Framework (SQL/Framework), of COmmittee JTC1/SC32 There at least in section 5.3.4 you find the pointer, that among others, the terms CALL and RETURN are specified in ISO9075-2. So that points, but to follow to the end ... ;-) but at least it is clear from this source, that CALL seems to be a statement in SQL to invoke a procedure or whatever name juggling suits. ... HTH, Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On 6/11/13 12:22 PM, Merlin Moncure wrote: Personally I think this patch should go in regardless -- the concerns made IMNSHO are specious. That's nice, but we have this process for validating whether features go in or not that relies on review instead of opinions. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On 2013-06-11 19:45 CEST, Greg Smith wrote: On 6/11/13 12:22 PM, Merlin Moncure wrote: Personally I think this patch should go in regardless -- the concerns made IMNSHO are specious. That's nice, but we have this process for validating whether features go in or not that relies on review instead of opinions. ;-) that's why I played with the test_fallocate.c, as it was easy to do and I understood, the author (of the patch) wanted to trigger some reviews ... I do not (yet) know anything about the core codes, so I leave this to the hackers. My review result was, that with newer gcc's you should specify an open mode flag as third argument of the fopen call, as only with the test tool nothing important found. Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Tue, Jun 11, 2013 at 12:49 PM, Stefan Drees ste...@drees.name wrote: On 2013-06-11 19:45 CEST, Greg Smith wrote: On 6/11/13 12:22 PM, Merlin Moncure wrote: Personally I think this patch should go in regardless -- the concerns made IMNSHO are specious. That's nice, but we have this process for validating whether features go in or not that relies on review instead of opinions. ;-) that's why I played with the test_fallocate.c, as it was easy to do and I understood, the author (of the patch) wanted to trigger some reviews ... I do not (yet) know anything about the core codes, so I leave this to the hackers. My review result was, that with newer gcc's you should specify an open mode flag as third argument of the fopen call, as only with the test tool nothing important found. If you change line 68 to read: fd = open(filename, O_CREAT | O_EXCL | O_WRONLY, 0600); then it should work just fine (assuming you have not already done so). -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SPGist triple parity concept doesn't work
Anyway I now think that we might be better off with the other idea of abandoning an insertion and retrying if we get a lock conflict. That would at least not create any performance penalty for non-concurrent scenarios; and even in concurrent cases, I suspect you'd have to be rather unlucky to get penalties as bad as the heavyweight-lock solution is showing. Agree, it would be a better workaround for now. I will be able to do this at this friday. I considered the idea to forbid placement of child on the same page as parent, but this implementation a) could significantly increase size of index, b) doesn't solve Greg's point. We definetly need new idea of locking protocol and I'll return to this problem at autumn (sorry, I havn't time in summer to do this research). -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/10/2013 11:22 PM, Noah Misch wrote: On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote: On 06/10/2013 10:18 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: After thinking about this some more I have come to the conclusion that we should only do any de-escaping of \u sequences, whether or not they are for BMP characters, when the server encoding is utf8. For any other encoding, which is already a violation of the JSON standard anyway, and should be avoided if you're dealing with JSON, we should just pass them through even in text output. This will be a simple and very localized fix. Hmm. I'm not sure that users will like this definition --- it will seem pretty arbitrary to them that conversion of \u sequences happens in some databases and not others. Yep. Suppose you have a LATIN1 database. Changing it to a UTF8 database where everyone uses client_encoding = LATIN1 should not change the semantics of successful SQL statements. Some statements that fail with one database encoding will succeed in the other, but a user should not witness a changed non-error result. (Except functions like decode() that explicitly expose byte representations.) Having SELECT '[\u00e4]'::json - 0 emit 'ä' in the UTF8 database and '\u00e4' in the LATIN1 database would move PostgreSQL in the wrong direction relative to that ideal. Then what should we do when there is no matching codepoint in the database encoding? First we'll have to delay the evaluation so it's not done over-eagerly, and then we'll have to try the conversion and throw an error if it doesn't work. The second part is what's happening now, but the delayed evaluation is not. +1 for doing it that way. As a final counter example, let me note that Postgres itself handles Unicode escapes differently in UTF8 databases - in other databases it only accepts Unicode escapes up to U+007f, i.e. ASCII characters. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.4 CF1 Starts Saturday: need patches, reviewers, asst. CFM
Hackers, As a reminder, the first 9.4 commitfest starts this Saturday. If you plan to submit patches for CF1, please make sure that they are mailed in and registered with the Commitfest app by Friday, June 14th. Patches added after midnight will be bounced to CF2. We already have 72 patches pending. If you are available to do CF reviews, and want to be assigned a patch instead of choosing one for yourself, then please join the pgsql-rrreviewers mailing list: http://www.postgresql.org/community/lists/subscribe/ (yes, that is 3 r's). Also, I'm still looking for an assistant Commitfest Manager to help with managing patches, authors and reviewers. Thanks! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 CF1 Starts Saturday: need patches, reviewers, asst. CFM
Josh, * Josh Berkus (j...@agliodbs.com) wrote: We already have 72 patches pending. Fun! If you are available to do CF reviews, and want to be assigned a patch instead of choosing one for yourself, then please join the pgsql-rrreviewers mailing list: http://www.postgresql.org/community/lists/subscribe/ (yes, that is 3 r's). Already on it and am planning to support with reviews and commits. :) Have we picked a date when we're going to branch..? As I recall, it was around this time last year, and clearly needs to be before we start commiting things from 9.4 CF1. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] 9.4 CF1 Starts Saturday: need patches, reviewers, asst. CFM
On Tue, Jun 11, 2013 at 11:46 PM, Stephen Frost sfr...@snowman.net wrote: Josh, * Josh Berkus (j...@agliodbs.com) wrote: We already have 72 patches pending. Fun! If you are available to do CF reviews, and want to be assigned a patch instead of choosing one for yourself, then please join the pgsql-rrreviewers mailing list: http://www.postgresql.org/community/lists/subscribe/ (yes, that is 3 r's). Already on it and am planning to support with reviews and commits. :) Have we picked a date when we're going to branch..? As I recall, it was around this time last year, and clearly needs to be before we start commiting things from 9.4 CF1. Thanks, Stephen Yay! -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Tue, Jun 11, 2013 at 12:45 PM, Greg Smith g...@2ndquadrant.com wrote: On 6/11/13 12:22 PM, Merlin Moncure wrote: Personally I think this patch should go in regardless -- the concerns made IMNSHO are specious. That's nice, but we have this process for validating whether features go in or not that relies on review instead of opinions. Sure: I phrased that badly by 'go in' I meant 'go through the review process', not commit out of hand. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
Andrew Dunstan and...@dunslane.net writes: As a final counter example, let me note that Postgres itself handles Unicode escapes differently in UTF8 databases - in other databases it only accepts Unicode escapes up to U+007f, i.e. ASCII characters. Good point. What if we adopt that same definition for JSON, and get rid of the need to do explicit encoding conversion at all in the JSON code? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: Should array_length() Return NULL)
On 04/08/2013 07:16 PM, Brendan Jurd wrote: On 9 April 2013 09:24, Josh Berkus j...@agliodbs.com wrote: As much as I have a keen interest in this feature, it isn't (AFAIK) being considered for 9.3. Given that it's generated a fair amount of controversy, could we table it until 9.3 beta? There's still plenty of unresolved 9.3 patches in the queue. No problem. I certainly wasn't expecting it to run for 90 messages when I started out. I'll pipe down for now and resume after the beta. What's the status on this patch and current approach to ZDA? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] request a new feature in fuzzystrmatch
On Sat, May 18, 2013 at 6:38 PM, David Fetter da...@fetter.org wrote: On Fri, May 17, 2013 at 05:50:38PM +0200, Cédric Villemain wrote: Hello Liming, Sounds interesting. How can we build this over our current implementation, or do we need to build it from scratch? I know how to write the code, but just need approval of accepting into the new version. Well, acceptance depends largely on the implementation and actual benefit statistics. I would suggest implementing a basic version and then demonstrating its potential benefits here. It will lead to clearer ideas for us and lead to improvements in the implementation. You can have a look at this page: http://wiki.postgresql.org/wiki/Submitting_a_Patch Also very important to look at for patch implementation and communication: http://wiki.postgresql.org/wiki/Working_with_Git Cheers, David (wondering whether we should provide specific instructions for github, bitbucket, etc.). I have implemented the code according to Joe's suggestion, and put the code at: https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1 Thanks, Liming -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Liming Hu cell: (435)-512-4190 Seattle Washington -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallell Optimizer
On 6/7/13 2:23 PM, Tom Lane wrote: As for other databases, I suspect that ones that have parallel execution are probably doing it with a thread model not a process model. Oracle 9i was multi-process, not multi-threaded. IIRC it actually had dedicated IO processes too; backends didn't do their own IO. We certainly need to protect the use case of queries that run in milliseconds, and clearly parallelism won't help there at all. But we can't ignore the other end of the spectrum; you'd need a LOT of communication overhead to swamp the benefits of parallel execution on a multi-minute, CPU-bound query (or in many cases even IO bound). -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
- the tps is global, with a mutex to share the global stochastic process - there is an adaptation for the fork emulation - I do not know wheter this works with Win32 pthread stuff. Instead of this complexity, Well, the mutex impact is very localized in the code. The complexity is more linked to the three thread implementations intermixed there. can we just split the TPS input per client? Obviously it is possible. Note that it is more logical to do that per thread. I did one shared stochastic process because it makes more sense to have just one. That's all I was thinking of here, not adding a new set of threading issues. If 1 TPS is requested and there's 10 clients, just set the delay so that each of them targets 1000 TPS. Ok, so I understand that a mutex is too much! I'm guessing it's more accurate to have them all communicate as you've done here, but it seems like a whole class of new bugs and potential bottlenecks could come out of that. I do not think that there is a performance or locking contension issue: it is about getting a mutex for a section which performs one integer add and two integer assignements, that is about 3 instructions, to be compared with the task of performing database operations over the network. There are several orders of magnitudes between those tasks. It would need a more than terrible mutex implementation to have any significant impact. Whenever someone touches the threading model for pgbench it usually gives a stack of build farm headaches. Better to avoid those unless there's really a compelling reason to go through that. I agree that the threading model in pgbench is a mess, mostly because of the 3 concurrent implementations intermixed in the code. Getting rid of the fork emulation and win32 special handling and only keeping the pthread implementation, which seems to be available even on windows, would be a relief. I'm not sure if there is still a rationale to have these 3 implementations, but it ensures a maintenance mess:-( I'll submit a version without mutex, but ISTM that this one is conceptually cleaner, although I'm not sure about what happens on windows. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallell Optimizer
On 11/06/13 19:24, Hannu Krosing wrote: On 06/10/2013 10:37 PM, FredDaniPandoraAquiles wrote: Hi, I asked a while ago in this group about the possibility to implement a parallel planner in a multithread way, and the replies were that the proposed approach couldn't be implemented, because the postgres is not thread-safe. With the new feature Background Worker Processes, such implementation would be possible? Well, there are versions of genetic algorithms that use the concept of islands in which the populations evolve in parallel in the different islands and allows interaction between the islands and so on. I'm working in an algorithm based on multiagent systems. At the present moment, I mean in H2, the agents are threads, there are a few locks related to agents solutions, and a few locks for the best current solution in the environment where the agents are 'running'. The agents can exchange messages with a purpose. The environment is shared by the all agents and they use the environment to get informations from another agents (current solution for example), tries to update the best current solution and so on. If you do this as an academic exercise, then I'd recommend thinking in messages only. Separate out the message delivery entirely from your core design. This makes the whole concept much simpler and more generic. Message delivery can be made almost instantaneous in case of threads or to take a few tens of microseconds to several seconds between different physical nodes Which speed is fast enough depends entirely on your query - for a query running 5 hours on single CPU and 5 minutes on a cluster, message delay of 50 ms is entirely acceptable -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ I suspect (from my position of almost total ignorance of this area!) that once a generic method works independently of how closely coupled the different parallel parts are, then a later optimisation could be added dependent on how the parts were related. So running on a multi core chip could have a different communication system to that running across multiple computer geographically dispersed. Thogh in practice, I suspect that bthe most common use case would involve many processor chips in the same 'box' (even if said box was distributed across a large room!). Anyhow, I think that separating out how to effectively parallelise Postgres from how the parts communicate is a Good Thing (TM). Though knowing Grim Reality, it is bound to b e more complicated in Reality! :-( As the useful size of work of the parallel units obviously does relate to the communication overhead. Possibly the biggest challenge will be in devising a planning methodology that can efficiently decide on an appropriate parallel strategy. Maybe a key word to tell the planner that you know this is a very big query and you don't mind it taking a long to come up with a decent plan? The planner would need to know details of the processing unit topology, communication overheads, and possibly other details - to make a really effective plan in the distributed case. My mind boggles, just thinking of the number of different variables that might be required to create an 'optimal' plan for parallel processing in a distributed system! Cheers, Gavin
Re: [HACKERS] Parallell Optimizer
On 06/11/2013 04:53 PM, Tatsuo Ishii wrote: On 11 June 2013 01:45, Tatsuo Ishii is...@postgresql.org wrote: On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs si...@2ndquadrant.com wrote: On 7 June 2013 20:23, Tom Lane t...@sss.pgh.pa.us wrote: As for other databases, I suspect that ones that have parallel execution are probably doing it with a thread model not a process model. Separate processes are more common because it covers the general case where query execution is spread across multiple nodes. Threads don't work across nodes and parallel queries predate (working) threading models. Indeed. Parallelism based on processes would be more convenient for master-master type of applications. Even if no master-master feature is implemented directly in core, at least a parallelism infrastructure based on processes could be used for this purpose. As long as true synchronous replication is not implemented in core, I am not sure there's a value for parallel execution spreading across multile nodes because of the delay of data update propagation. Please explain what you mean by the word true used here. In another word, eager replication. Do you mean something along these lines : Most synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For instance, if a record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A lazy replication system would allow both transactions to commit and run a conflict resolution during resynchronization. ? IMO it is possible to do this easily once BDR has reached the state where you can do streaming apply. That is, you replay actions on other hosts as they are logged, not after the transaction commits. Doing it this way you can wait any action to successfully complete a full circle before committing it in source. Currently main missing part in doing this is autonomous transactions. It can in theory be done by opening an extra backend for each incoming transaction but you will need really big number of backends and also you have extra overhead from interprocess communications. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Submission 10: - per thread throttling instead of a global process with a mutex. this avoids a mutex, and the process is shared between clients of a given thread. - ISTM that there thread start time should be initialized at the beginning of threadRun instead of in the loop *before* thread creation, otherwise the thread creation delays are incorporated in the performance measure, but ISTM that the point of pgbench is not to measure thread creation performance... I've added a comment suggesting where it should be put instead, first thing in threadRun. -- Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 8c202bf..8d7979b 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -137,6 +137,12 @@ int unlogged_tables = 0; double sample_rate = 0.0; /* + * When threads are throttled to a given rate limit, this is the target delay + * to reach that rate in usec. 0 is the default and means no throttling. + */ +int64 throttle_delay = 0; + +/* * tablespace selection */ char *tablespace = NULL; @@ -205,6 +211,7 @@ typedef struct int nvariables; instr_time txn_begin; /* used for measuring transaction latencies */ instr_time stmt_begin; /* used for measuring statement latencies */ + bool throttled; /* whether current transaction was throttled */ int use_file; /* index in sql_files for this client */ bool prepared[MAX_FILES]; } CState; @@ -222,6 +229,10 @@ typedef struct instr_time *exec_elapsed; /* time spent executing cmds (per Command) */ int *exec_count; /* number of cmd executions (per Command) */ unsigned short random_state[3]; /* separate randomness for each thread */ +int64 throttle_trigger; /* previous/next throttling (us) */ + int64 throttle_lag; /* avg lag transaction behind throttling */ + int64 throttle_lag_max; /* max lag */ + } TState; #define INVALID_THREAD ((pthread_t) 0) @@ -230,6 +241,8 @@ typedef struct { instr_time conn_time; int xacts; + int64 throttle_lag; + int64 throttle_lag_max; } TResult; /* @@ -355,6 +368,8 @@ usage(void) -n do not run VACUUM before tests\n -N do not update tables \pgbench_tellers\ and \pgbench_branches\\n -r report average latency per command\n + -R SPEC, --rate SPEC\n + target rate in transactions per second\n -s NUM report this scale factor in output\n -S perform SELECT-only transactions\n -t NUM number of transactions each client runs (default: 10)\n @@ -902,13 +917,51 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa top: commands = sql_files[st-use_file]; + /* handle throttling once per transaction by inserting a sleep. + * this is simpler than doing it at the end. + */ + if (throttle_delay ! st-throttled) + { + /* compute delay to approximate a Poisson distribution + * 100 = 13.8 .. 0 multiplier + * 10 = 11.5 .. 0 + * 1 = 9.2 .. 0 + *1000 = 6.9 .. 0 + * if transactions are too slow or a given wait shorter than + * a transaction, the next transaction will start right away. + */ + int64 wait = (int64) + throttle_delay * -log(getrand(thread, 1, 1000)/1000.0); + + thread-throttle_trigger += wait; + + st-until = thread-throttle_trigger; + st-sleeping = 1; + st-throttled = true; + if (debug) + fprintf(stderr, client %d throttling INT64_FORMAT us\n, + st-id, wait); + } + if (st-sleeping) { /* are we sleeping? */ instr_time now; + int64 now_us; INSTR_TIME_SET_CURRENT(now); - if (st-until = INSTR_TIME_GET_MICROSEC(now)) + now_us = INSTR_TIME_GET_MICROSEC(now); + if (st-until = now_us) + { st-sleeping = 0; /* Done sleeping, go ahead with next command */ + if (throttle_delay st-state==0) + { +/* measure lag of throttled transaction */ +int64 lag = now_us - st-until; +thread-throttle_lag += lag; +if (lag thread-throttle_lag_max) + thread-throttle_lag_max = lag; + } + } else return true; /* Still sleeping, nothing to do here */ } @@ -1095,6 +1148,7 @@ top: st-state = 0; st-use_file = (int) getrand(thread, 0, num_files - 1); commands = sql_files[st-use_file]; + st-throttled = false; } } @@ -2017,7 +2071,8 @@ process_builtin(char *tb) static void printResults(int ttype, int normal_xacts, int nclients, TState *threads, int nthreads, - instr_time total_time, instr_time conn_total_time) + instr_time total_time, instr_time conn_total_time, + int64 throttle_lag, int64 throttle_lag_max) { double time_include, tps_include, @@ -2055,6 +2110,18 @@ printResults(int ttype, int normal_xacts, int nclients, printf(number of transactions actually processed: %d\n, normal_xacts); } + + if (throttle_delay) + { + /* Report average transaction lag under throttling, i.e. the delay +
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
On 6/11/13 4:11 PM, Fabien COELHO wrote: - ISTM that there thread start time should be initialized at the beginning of threadRun instead of in the loop *before* thread creation, otherwise the thread creation delays are incorporated in the performance measure, but ISTM that the point of pgbench is not to measure thread creation performance... I noticed that, but it seemed a pretty minor issue. Did you look at the giant latency spikes at the end of the test run I submitted the graph for? I wanted to nail down what was causing those before worrying about the startup timing. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Current CLOBBER_CACHE_ALWAYS failures
Christian Ullrich ch...@chrullrich.net writes: The CLOBBER_CACHE_ALWAYS animals (friarbird and jaguarundi) have been failing persistently for about 36 hours now. The error is in a test added by Tom's recent commit a4424c5: Huh ... this is a pre-existing bug in the typcache code. Will fix, thanks for the report! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] request a new feature in fuzzystrmatch
Liming Hu escribió: I have implemented the code according to Joe's suggestion, and put the code at: https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1 Please submit a proper patch so it can be seen on our mailing list archives. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
- ISTM that there thread start time should be initialized at the beginning of threadRun instead of in the loop *before* thread creation, otherwise the thread creation delays are incorporated in the performance measure, but ISTM that the point of pgbench is not to measure thread creation performance... I noticed that, but it seemed a pretty minor issue. Not for me, because the max lag measured in my first version was really the threads creation time, not very interesting. Did you look at the giant latency spikes at the end of the test run I submitted the graph for? I've looked at the graph you sent. I must admit that I did not understand exactly what is measured and where it is measured. Because of its position at the end of the run, I thought of some disconnection related effects when pgbench run is interrupted by a time out signal, so some things are done more slowly. Fine with me, we are stopping anyway, and out of the steady state. I wanted to nail down what was causing those before worrying about the startup timing. Well, the short answer is that I'm not worried by that, for the reason explained above. I would be worried if it was anywhere else but where the transactions are interrupted, the connections are closed and the threads are stopped. I may be wrong:-) -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] request a new feature in fuzzystrmatch
On Tue, Jun 11, 2013 at 1:57 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Liming Hu escribió: I have implemented the code according to Joe's suggestion, and put the code at: https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1 Please submit a proper patch so it can be seen on our mailing list archives. Hi Alvaro, I am kind of new to the Postgresql hacker community, Can you please help me on submit the patch? Thanks, Liming -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Liming Hu cell: (435)-512-4190 Seattle Washington -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to find out whether a view is updatable
Dean Rasheed dean.a.rash...@gmail.com writes: Here's a more complete patch along those lines. It defines the following pair of functions to test for updatability from SQL: FUNCTION pg_catalog.pg_relation_is_updatable(reloid oid, include_triggers boolean) RETURNS integer FUNCTION pg_catalog.pg_column_is_updatable(reloid oid, attnum smallint, include_triggers boolean) RETURNS boolean and the following FDW functions: int IsForeignRelUpdatable (Oid foreigntableid); bool IsForeignColUpdatable (Oid foreigntableid, AttrNumber attnum); I'm looking at this patch now. I do not see the point of pg_column_is_updatable nor IsForeignColUpdatable --- that's loading additional complexity onto every FDW, to support what use-cases exactly? Is it really likely that (a) there are any cases out there where FDWs would support updating only some columns, and (b) anybody would care whether or not information_schema.columns reflects such a restriction accurately? So I'm inclined to drop that part. As an initial implementation of this API in the postgres-fdw, I've added a new option updatable (true by default), which can be specified as a server option or as a per-table option, to give user control over whether individual foreign tables are read-only or updatable. Do we really want that as a server option? Why? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] request a new feature in fuzzystrmatch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/11/2013 02:23 PM, Liming Hu wrote: On Tue, Jun 11, 2013 at 1:57 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Liming Hu escribió: I have implemented the code according to Joe's suggestion, and put the code at: https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1 Please submit a proper patch so it can be seen on our mailing list archives. Hi Alvaro, I am kind of new to the Postgresql hacker community, Can you please help me on submit the patch? Hi Liming, I might be able to help, but it will be at least a couple of days before I have the time to look at this, Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRt50oAAoJEDfy90M199hliukP/A8IZf7L0DcYWG/jZUttTg0M yWpCz1KHZ7fAGqeJ5ddzuSx5CMjsmt2zYQ+0EfLX1ftoSxIHaFpSC49GgJHUdAoq TSOOC1rfkfmNJG98WfUPH14Flq4eEl9reUZDXXi3jqol+npJdAQaEt9EM5y0qkcB pDCy9iMaYdjYNV6RXFOBNI+7Up43oULbMLhWwwFnGX9AgCLk8SGRZHnfT1zOaXYE bW/Dl9TIu058ENZqFFLIfjxqngE/Y61SOaLRAxEkbO7HAFkuwgQwnIayrasR29F+ GX4+HRcsd0jrcF858Dm4E+YAffI2quOR2HgZTGQj6jTNtRpb16+EI+X1vrYI3AH8 5tf3exq9FDL/02zlCpKm8+uXksaLzffQXlbQPw8uDlpk+ThD0uo2990/TC6QYgXO o2vCu+nkWdTc2AUk4NkoVFwXaaFLZ+M8U+swRHNAShqH7VQVx8rRwubsRJ1msT3i nC8BFyMsBzOmsKNwO0IM6ZcsJXaIpmMCUshF8cGDfgpsWUQ/wzovzopb+PSiEKQB X45hWYtiK3tIvo0f9gvEWzRJ4+O8tmmIZzzWz127yBdugV0xjsEHnDihpjkeMPJx WHs5ViN62u62r34UCtX1oiClCC+FYR0f//alh48VlXWzP7NkFt4dcKOD6ZCzSOeQ udKM0QF2TNbUebj9QIpq =W6Oy -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] request a new feature in fuzzystrmatch
Liming Hu dawnin...@gmail.com wrote: I am kind of new to the Postgresql hacker community, Can you please help me on submit the patch? You might want to read this page: http://wiki.postgresql.org/wiki/Developer_FAQ In particular, the Development Process section has a link to Submitting a Patch that you should read. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: Should array_length() Return NULL)
On 12 June 2013 04:43, Josh Berkus j...@agliodbs.com wrote: What's the status on this patch and current approach to ZDA? Alright, it might be a good idea to have a quick recap. Last time, on Arrays Of Our Lives ... So I proposed and posted a patch aimed at deprecating zero-D arrays, and replacing them with 1-D empty arrays. That kicked off a long discussion, with some folks (Robert, Merlin, Florian et al) vigorously opposing any change in the behaviour of the array_(length|ndims|lower|upper) functions. To those folks, the behaviour would have to be a lot worse than it currently is to justify breaking compatibility with existing applications. The idea of using a GUC to smooth over the compatibility break was suggested, and firmly rejected. The idea of creating an entirely new type with nicer behaviours came up, but that wouldn't really fly because arrays have already hogged all the best syntax. Since compatibility breakage is so contentious, I suggested that we forget about changing the array representation and just add new functions with more sensible behaviours: * cardinality(anyarray) to return the length of the first dimension, zero if empty, and * array_num_items(anyarray) to return the total number of element positions per ArrayGetNItems, zero if empty. There have been attempts to add a cardinality function in the past, as it is required by the SQL spec, but these attempts have stalled when trying to decide how it should handle multidim arrays. Having it return the length of the first dimension is the more spec-compatible way to go, but some folks argued that it should work as ArrayGetNItems, because we don't already have a function for that at the SQL level. Therefore I propose we add cardinality() per the spec, and another function to expose ArrayGetNItems. And that's about where we got to, when the whole discussion was put on a time-out to make room for the beta. I am withdrawing the original zero-D patch in favour of the proposed new functions. If you have an opinion about that, please do chime in. Depending on how that goes I may post a patch implementing my new proposal in the next few days. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] request a new feature in fuzzystrmatch
On Tue, Jun 11, 2013 at 2:56 PM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/11/2013 02:23 PM, Liming Hu wrote: On Tue, Jun 11, 2013 at 1:57 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Liming Hu escribió: I have implemented the code according to Joe's suggestion, and put the code at: https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1 Please submit a proper patch so it can be seen on our mailing list archives. Hi Alvaro, I am kind of new to the Postgresql hacker community, Can you please help me on submit the patch? Hi Liming, I might be able to help, but it will be at least a couple of days before I have the time to look at this, Joe ok, thanks, I will wait. Liming - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRt50oAAoJEDfy90M199hliukP/A8IZf7L0DcYWG/jZUttTg0M yWpCz1KHZ7fAGqeJ5ddzuSx5CMjsmt2zYQ+0EfLX1ftoSxIHaFpSC49GgJHUdAoq TSOOC1rfkfmNJG98WfUPH14Flq4eEl9reUZDXXi3jqol+npJdAQaEt9EM5y0qkcB pDCy9iMaYdjYNV6RXFOBNI+7Up43oULbMLhWwwFnGX9AgCLk8SGRZHnfT1zOaXYE bW/Dl9TIu058ENZqFFLIfjxqngE/Y61SOaLRAxEkbO7HAFkuwgQwnIayrasR29F+ GX4+HRcsd0jrcF858Dm4E+YAffI2quOR2HgZTGQj6jTNtRpb16+EI+X1vrYI3AH8 5tf3exq9FDL/02zlCpKm8+uXksaLzffQXlbQPw8uDlpk+ThD0uo2990/TC6QYgXO o2vCu+nkWdTc2AUk4NkoVFwXaaFLZ+M8U+swRHNAShqH7VQVx8rRwubsRJ1msT3i nC8BFyMsBzOmsKNwO0IM6ZcsJXaIpmMCUshF8cGDfgpsWUQ/wzovzopb+PSiEKQB X45hWYtiK3tIvo0f9gvEWzRJ4+O8tmmIZzzWz127yBdugV0xjsEHnDihpjkeMPJx WHs5ViN62u62r34UCtX1oiClCC+FYR0f//alh48VlXWzP7NkFt4dcKOD6ZCzSOeQ udKM0QF2TNbUebj9QIpq =W6Oy -END PGP SIGNATURE- -- Liming Hu cell: (435)-512-4190 Seattle Washington -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On Tue, Jun 11, 2013 at 02:10:45PM -0400, Andrew Dunstan wrote: On 06/10/2013 11:22 PM, Noah Misch wrote: On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote: On 06/10/2013 10:18 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: After thinking about this some more I have come to the conclusion that we should only do any de-escaping of \u sequences, whether or not they are for BMP characters, when the server encoding is utf8. For any other encoding, which is already a violation of the JSON standard anyway, and should be avoided if you're dealing with JSON, we should just pass them through even in text output. This will be a simple and very localized fix. Hmm. I'm not sure that users will like this definition --- it will seem pretty arbitrary to them that conversion of \u sequences happens in some databases and not others. Yep. Suppose you have a LATIN1 database. Changing it to a UTF8 database where everyone uses client_encoding = LATIN1 should not change the semantics of successful SQL statements. Some statements that fail with one database encoding will succeed in the other, but a user should not witness a changed non-error result. (Except functions like decode() that explicitly expose byte representations.) Having SELECT '[\u00e4]'::json - 0 emit '?' in the UTF8 database and '\u00e4' in the LATIN1 database would move PostgreSQL in the wrong direction relative to that ideal. As a final counter example, let me note that Postgres itself handles Unicode escapes differently in UTF8 databases - in other databases it only accepts Unicode escapes up to U+007f, i.e. ASCII characters. I don't see a counterexample there; every database that accepts without error a given Unicode escape produces from it the same text value. The proposal to which I objected was akin to having non-UTF8 databases silently translate E'\u0220' to E'\\u0220'. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Clean switchover
Hi, In streaming replication, when we shutdown the master, walsender tries to send all the outstanding WAL records including the shutdown checkpoint record to the standby, and then to exit. This basically means that all the WAL records are fully synced between two servers after the clean shutdown of the master. So, after promoting the standby to new master, we can restart the stopped master as new standby without the need for a fresh backup from new master. But there is one problem: though walsender tries to send all the outstanding WAL records, it doesn't wait for them to be replicated to the standby. IOW, walsender closes the replication connection as soon as it sends WAL records. Then, before receiving all the WAL records, walreceiver can detect the closure of connection and exit. We cannot guarantee that there is no missing WAL in the standby after clean shutdown of the master. In this case, backup from new master is required when restarting the stopped master as new standby. I have experienced this case several times, especially when enabling WAL archiving. The attached patch fixes this problem. It just changes walsender so that it waits for all the outstanding WAL records to be replicated to the standby before closing the replication connection. You may be concerned the case where the standby gets stuck and the walsender keeps waiting for the reply from that standby. In this case, wal_sender_timeout detects such inactive standby and then walsender ends. So even in that case, the shutdown can end. Thought? Regards, -- Fujii Masao switchover_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 06:26 PM, Noah Misch wrote: As a final counter example, let me note that Postgres itself handles Unicode escapes differently in UTF8 databases - in other databases it only accepts Unicode escapes up to U+007f, i.e. ASCII characters. I don't see a counterexample there; every database that accepts without error a given Unicode escape produces from it the same text value. The proposal to which I objected was akin to having non-UTF8 databases silently translate E'\u0220' to E'\\u0220'. What? There will be no silent translation. The only debate here is about how these databases turn strings values inside a json datum into PostgreSQL text values via the documented operation of certain functions and operators. If the JSON datum doesn't already contain a unicode escape then nothing of what's been discussed would apply. Nothing whatever that's been proposed would cause a unicode escape sequence to be emitted that wasn't already there in the first place, and no patch that I have submitted has contained any escape sequence generation at all. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallell Optimizer
Please explain what you mean by the word true used here. In another word, eager replication. Do you mean something along these lines : Most synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For instance, if a record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A lazy replication system would allow both transactions to commit and run a conflict resolution during resynchronization. ? No, I'm not talking about conflict resolution. From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf: -- Eager or Lazy Replication? Eager replication: keep all replicas synchronized by updating all replicas in a single transaction Lazy replication: asynchronously propagate replica updates to other nodes after replicating transaction commits -- Parallel query execution needs to assume that each node synchronized in a commit, otherwise the summary of each query result executed on each node is meaningless. IMO it is possible to do this easily once BDR has reached the state where you can do streaming apply. That is, you replay actions on other hosts as they are logged, not after the transaction commits. Doing it this way you can wait any action to successfully complete a full circle before committing it in source. Currently main missing part in doing this is autonomous transactions. It can in theory be done by opening an extra backend for each incoming transaction but you will need really big number of backends and also you have extra overhead from interprocess communications. Thanks for a thought about the conflict resolution in BDR. BTW, if we seriously think about implementing the parallel query execution, we need to find a way to distribute data among each node, that requires partial copy of table. I thinl that would a big challenge for WAL based replication. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallell Optimizer
On 06/12/2013 01:01 AM, Tatsuo Ishii wrote: Please explain what you mean by the word true used here. In another word, eager replication. Do you mean something along these lines : Most synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For instance, if a record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A lazy replication system would allow both transactions to commit and run a conflict resolution during resynchronization. ? No, I'm not talking about conflict resolution. From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf: -- Eager or Lazy Replication? Eager replication: keep all replicas synchronized by updating all replicas in a single transaction Ok, so you are talking about distributed transactions ? In our current master-slave replication, how would it be different from current synchronous replication ? Or does it make sense only in case of multimaster replication ? The main problems with keep all replicas synchronized by updating all replicas in a single transaction are performance and reliability. That is, the write performance has to be less than for single server and failure of a single replica brings down the whole cluster. Lazy replication: asynchronously propagate replica updates to other nodes after replicating transaction commits -- Parallel query execution needs to assume that each node synchronized in a commit, otherwise the summary of each query result executed on each node is meaningless. IMO it is possible to do this easily once BDR has reached the state where you can do streaming apply. That is, you replay actions on other hosts as they are logged, not after the transaction commits. Doing it this way you can wait any action to successfully complete a full circle before committing it in source. Currently main missing part in doing this is autonomous transactions. It can in theory be done by opening an extra backend for each incoming transaction but you will need really big number of backends and also you have extra overhead from interprocess communications. Thanks for a thought about the conflict resolution in BDR. BTW, if we seriously think about implementing the parallel query execution, we need to find a way to distribute data among each node, that requires partial copy of table. I thinl that would a big challenge for WAL based replication. Moving partial query results around is completely different problem from replication. We should not mix these. If on the other hand think about sharding (that is having a table partitioned between nodes) then this can be done in BDR. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to find out whether a view is updatable
On 11 June 2013 22:53, Tom Lane t...@sss.pgh.pa.us wrote: Dean Rasheed dean.a.rash...@gmail.com writes: Here's a more complete patch along those lines. It defines the following pair of functions to test for updatability from SQL: FUNCTION pg_catalog.pg_relation_is_updatable(reloid oid, include_triggers boolean) RETURNS integer FUNCTION pg_catalog.pg_column_is_updatable(reloid oid, attnum smallint, include_triggers boolean) RETURNS boolean and the following FDW functions: int IsForeignRelUpdatable (Oid foreigntableid); bool IsForeignColUpdatable (Oid foreigntableid, AttrNumber attnum); I'm looking at this patch now. I do not see the point of pg_column_is_updatable nor IsForeignColUpdatable --- that's loading additional complexity onto every FDW, to support what use-cases exactly? Is it really likely that (a) there are any cases out there where FDWs would support updating only some columns, and (b) anybody would care whether or not information_schema.columns reflects such a restriction accurately? So I'm inclined to drop that part. I originally thought of adding pg_column_is_updatable() because I was imagining supporting more of the SQL standard on updatable views, which allows for a subset of the columns to be updatable, but we could always add such a function when/if we implement that feature. As for IsForeignColUpdatable(), I think you're probably right. If it's only purpose is to support information_schema.columns, it's probably of very limited interest to anyone. As an initial implementation of this API in the postgres-fdw, I've added a new option updatable (true by default), which can be specified as a server option or as a per-table option, to give user control over whether individual foreign tables are read-only or updatable. Do we really want that as a server option? Why? Not sure. I thought it might be useful if you were setting up a connection to a foreign server and you knew that you only wanted read access to all the tables in it, this would avoid having to specify the option on every table. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallell Optimizer
No, I'm not talking about conflict resolution. From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf: -- Eager or Lazy Replication? Eager replication: keep all replicas synchronized by updating all replicas in a single transaction Ok, so you are talking about distributed transactions ? In our current master-slave replication, how would it be different from current synchronous replication ? Or does it make sense only in case of multimaster replication ? The main problems with keep all replicas synchronized by updating all replicas in a single transaction are performance and reliability. That is, the write performance has to be less than for single server That's just a log based replication's specific limitation. It needs to wait for log replay, which is virtually same as a cluster wide giant lock. On the other hand, non log based replication systems (if my understanding is correct, Postgres-XC is the case) could perform better than single server. and failure of a single replica brings down the whole cluster. That's a price of eager replication. However it could be mitigated by using existing HA technologies. Lazy replication: asynchronously propagate replica updates to other nodes after replicating transaction commits -- Parallel query execution needs to assume that each node synchronized in a commit, otherwise the summary of each query result executed on each node is meaningless. IMO it is possible to do this easily once BDR has reached the state where you can do streaming apply. That is, you replay actions on other hosts as they are logged, not after the transaction commits. Doing it this way you can wait any action to successfully complete a full circle before committing it in source. Currently main missing part in doing this is autonomous transactions. It can in theory be done by opening an extra backend for each incoming transaction but you will need really big number of backends and also you have extra overhead from interprocess communications. Thanks for a thought about the conflict resolution in BDR. BTW, if we seriously think about implementing the parallel query execution, we need to find a way to distribute data among each node, that requires partial copy of table. I thinl that would a big challenge for WAL based replication. Moving partial query results around is completely different problem from replication. We should not mix these. I just explained why log based replication could not be a infrastructure for the parallel query execution. One reason is lazy replication, the other is the ability of partial copy. If on the other hand think about sharding (that is having a table partitioned between nodes) then this can be done in BDR. Ok, I didn't know that BRD can do it. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Adding IEEE 754:2008 decimal floating point and hardware support for it
Hi all Currently DECIMAL is an alias for NUMERIC, Pg's built-in arbitrary precision and scale decimal type. I'd like to explore the possibility of using hardware decimal floating point support in newer processors, compilers and C libraries to enhance DECIMAL / NUMERIC performance. With the advent of _Decimal32, _Decimal64 and _Decimal128 support in IEEE 754:2008 as supported in gcc in float.h TR24732 we have the opportunity to make use of hardware representations of decimal floating point values and hardware implementations of operations on them, gaining a potentially huge performance boost in exchange for more limited precision and scale. I'd like to gather ideas and suggestions about how we might approach this. The main thing I'm wondering is how/if to handle backward compatibility with the existing NUMERIC and its DECIMAL alias, or whether adding new DECIMAL32, DECIMAL64, and DECIMAL128 types would be more appropriate. I'd love to just use the SQL standard types name DECIMAL if possible, and the standard would allow for it (see below), but backward compat would be a challenge, as would coming up with a sensible transparent promotion scheme from 32-64-128-numeric and ways to stop undesired promotion. What I'm currently thinking of is using the same strategy we use right now for FLOAT(n) where we select between float4 and float8 based on the specified precision. We could do the same for DECIMAL; up to DECIMAL(94,7) would become decimal32; up to DECIMAL(382,16) would become decimal64 and DECIMAL128(34,6142); everything higher would become NUMERIC as currently. NUMERIC would be unaffected. (Ideally we wouldn't have to do the type change in the parser hack but that's not really possible so long as Pg doesn't preserve typmods in calculations and intermediate outputs). According to TR24732 ( http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1312.pdf) the _Decimal family of types offer: _Decimal32: 7 coefficient digits, 10^97 to 10^-94 range. _Decimal64: 16 coefficient digits, 10^385 to 10^-382 range. _Decimal128 34 coefficient digits, 10^6145 to 10^-6142 range. There was a thread about this on -general some time ago: http://www.postgresql.org/message-id/4cb26b16.7080...@postnewspapers.com.au that never went anywhere. Other discussion is mostly of use cases and is more hypothetical, but outlines why they'd be useful: http://www.postgresql.org/message-id/CADLWmXVmne9t5x-hR-XGOxEyOWQX5BfZwc9Qb=xhsj_gkg_...@mail.gmail.com In terms of how they fit in to the standard, the copy of the SQL:2008 draft I have here says: * NUMERIC specifies the data type exact numeric, with the decimal precision and scale specified by the precision and scale. * DECIMAL specifies the data type exact numeric, with the decimal scale specified by the scale and the implementation-defined decimal precision equal to or greater than the value of the specified precision. Additionally: * For the exact numeric types DECIMAL and NUMERIC, the maximum values of precision and of scale are implementation-defined. ... so it seems we'd be free to use the hardware types and could possibly internally promote from smaller to larger decimal types as appropriate. My main concern is that even the largest fixed-size decimal can't store the arbitrary precision and scale supported by Pg's NUMERIC, and people are used to using DECIMAL as an alias for NUMERIC. We're in a bit of a BC trap, as the spec would allow us to just use the hardware types, but we've already provided support for a nearly unlimited (but much, much slower) type using the same name. Regarding the breadth of support, it looks like we could use gcc's built-in types if available, and otherwise fall back to one of the portable decimal floating point maths libraries. gcc 4.3 added _Decimal types, see http://gcc.gnu.org/gcc-4.3/changes.html, so it's ancient history for gcc. Hardware support isn't required; in fact the real question would be whether gcc actually uses the hardware operations where they're supported. I'll need to dig into that. For Windows, MSVC doesn't support the types as built-ins. There's the Intel Decimal Floating Point Library (http://software.intel.com/en-us/articles/intel-decimal-floating-point-math-library, also available from http://www.netlib.org/misc/intel/) to provide support as a library. IBM's decnumber libraray is another possible alternative. LLVM's clang does not support those types according to the manual: clang does not support decimal floating point types (_Decimal32 and friends) or fixed-point types (_Fract and friends); nobody has expressed interest in these features yet, so it's hard to say when they will be implemented. (http://clang.llvm.org/docs/UsersManual.html). The Intel library or IBM decnumber should work, but would need to be checked. See http://www.ac.usc.es/arith19/sites/default/files/3670a225-spec-session-DFP-paper2.pdf for (somewhat old) details on Intel processor support. There's also support in POWER6 and System z10
Re: [HACKERS] JSON and unicode surrogate pairs
On Tue, Jun 11, 2013 at 06:58:05PM -0400, Andrew Dunstan wrote: On 06/11/2013 06:26 PM, Noah Misch wrote: As a final counter example, let me note that Postgres itself handles Unicode escapes differently in UTF8 databases - in other databases it only accepts Unicode escapes up to U+007f, i.e. ASCII characters. I don't see a counterexample there; every database that accepts without error a given Unicode escape produces from it the same text value. The proposal to which I objected was akin to having non-UTF8 databases silently translate E'\u0220' to E'\\u0220'. What? There will be no silent translation. The only debate here is about how these databases turn strings values inside a json datum into PostgreSQL text values via the documented operation of certain functions and operators. If the JSON datum doesn't already contain a unicode escape then nothing of what's been discussed would apply. Nothing whatever that's been proposed would cause a unicode escape sequence to be emitted that wasn't already there in the first place, and no patch that I have submitted has contained any escape sequence generation at all. Under your proposal to which I was referring, this statement would return true in UTF8 databases and false in databases of other encodings: SELECT '[\u0220]'::json - 0 = E'\u0220' Contrast the next statement, which would return false in UTF8 databases and true in databases of other encodings: SELECT '[\u0220]'::json - 0 = E'\\u0220' Defining -(json,int) and -(json,text) in this way would be *akin to* having SELECT E'\u0220' = E'\\u0220' return true in non-UTF8 databases. I refer to user-visible semantics, not matters of implementation. Does that help to clarify my earlier statement? -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding IEEE 754:2008 decimal floating point and hardware support for it
Craig Ringer cr...@2ndquadrant.com writes: Currently DECIMAL is an alias for NUMERIC, Pg's built-in arbitrary precision and scale decimal type. I'd like to explore the possibility of using hardware decimal floating point support in newer processors, compilers and C libraries to enhance DECIMAL / NUMERIC performance. As near as I can tell, there is no such hardware support. The Intel paper you reference describes a pure-software library, and states A software implementation was deemed sufficient for the foreseeable future. The source code for that library is apparently available under a liberal license. It might be more useful to eyeball what they did and see if we can learn anything towards speeding up the existing variable-precision NUMERIC type. The main thing I'm wondering is how/if to handle backward compatibility with the existing NUMERIC and its DECIMAL alias, or whether adding new DECIMAL32, DECIMAL64, and DECIMAL128 types would be more appropriate. I'd love to just use the SQL standard types name DECIMAL if possible, and the standard would allow for it (see below), but backward compat would be a challenge, as would coming up with a sensible transparent promotion scheme from 32-64-128-numeric and ways to stop undesired promotion. Indeed. I think you're basically between a rock and a hard place there. It would be very very difficult to shoehorn such types into the existing numeric hierarchy if you wanted any sort of transparency of behavior, I fear. On the other hand, I doubt that it's going to work to make the existing numeric type switch to the hardware representation for suitably-constrained columns, because what are you going to do when, say, the result of an addition overflows the hardware width? You can't just throw an error immediately, because you won't know whether the output is supposed to be getting shoved back into a limited-width column or not. And on top of that, you have the very strong likelihood that the hardware implementation(s) won't behave exactly like our existing NUMERIC routines --- for instance, I'd bet a nickel that Intel took more care with last-place roundoff than our code does. So now we would have not just backwards-compatibility worries, but platform-dependent results for a data type that didn't use to have any such issue. I think people who expect NUMERIC to be exact would really get bent out of shape about that idea. On the whole, I think the effort would be a lot more usefully spent on trying to make the existing NUMERIC support go faster. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/11/2013 08:18 PM, Noah Misch wrote: On Tue, Jun 11, 2013 at 06:58:05PM -0400, Andrew Dunstan wrote: On 06/11/2013 06:26 PM, Noah Misch wrote: As a final counter example, let me note that Postgres itself handles Unicode escapes differently in UTF8 databases - in other databases it only accepts Unicode escapes up to U+007f, i.e. ASCII characters. I don't see a counterexample there; every database that accepts without error a given Unicode escape produces from it the same text value. The proposal to which I objected was akin to having non-UTF8 databases silently translate E'\u0220' to E'\\u0220'. What? There will be no silent translation. The only debate here is about how these databases turn strings values inside a json datum into PostgreSQL text values via the documented operation of certain functions and operators. If the JSON datum doesn't already contain a unicode escape then nothing of what's been discussed would apply. Nothing whatever that's been proposed would cause a unicode escape sequence to be emitted that wasn't already there in the first place, and no patch that I have submitted has contained any escape sequence generation at all. Under your proposal to which I was referring, this statement would return true in UTF8 databases and false in databases of other encodings: SELECT '[\u0220]'::json - 0 = E'\u0220' Contrast the next statement, which would return false in UTF8 databases and true in databases of other encodings: SELECT '[\u0220]'::json - 0 = E'\\u0220' Defining -(json,int) and -(json,text) in this way would be *akin to* having SELECT E'\u0220' = E'\\u0220' return true in non-UTF8 databases. I refer to user-visible semantics, not matters of implementation. Does that help to clarify my earlier statement? Well, I think that's drawing a bit of a long bow, but never mind. If we work by analogy to Postgres' own handling of Unicode escapes, we'll raise an error on any Unicode escape beyond ASCII (not on input for legacy reasons, but on trying to process such datums). I gather that would meet your objection. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configurable location for extension .control files
Hi Josh On 11 June 2013 04:37, Josh Berkus j...@agliodbs.com wrote: I don't personally see a reason for plural locations, but it would be nice if it recursed (that is, looked for .so's in subdirectories). My reason for this is that I work on applications which have in-house extensions as well as public ones, and I'd like to keep the two separated by directory. I gave one example of a use-case for multiple directories upthread - the Postgres.app mac app has contrib, plv8 and postgis bundled under its application folder, but it would be nice to allow users to drop extra extensions under ~/Library/Postgres.app somewhere. If we had directory recursion then you could sort of fake it using symlinks (as long as we follow the symlinks) but it's pretty messy, the wrapper app would have to make a dir under ~/Library the actual extension dir and have a symlink back to its bundled extensions. Not the end of the world though. For any of that to work the dir (or dirs) would need to be settable by config or startup option - compiled in wouldn't cut it, since the absolute dir of the end user's home directory isn't known at compile time. Cheers Tom
Re: [HACKERS] Clean switchover
* Fujii Masao (masao.fu...@gmail.com) wrote: The attached patch fixes this problem. It just changes walsender so that it waits for all the outstanding WAL records to be replicated to the standby before closing the replication connection. Seems like a good idea to me.. Rather surprised that we're not doing this already, to be honest. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] JSON and unicode surrogate pairs
On Tue, Jun 11, 2013 at 08:42:26PM -0400, Andrew Dunstan wrote: If we work by analogy to Postgres' own handling of Unicode escapes, we'll raise an error on any Unicode escape beyond ASCII (not on input for legacy reasons, but on trying to process such datums). I gather that would meet your objection. No objection here; that approach is free from the problem that caught my attention in the other proposal. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding IEEE 754:2008 decimal floating point and hardware support for it
On 06/12/2013 08:35 AM, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: Currently DECIMAL is an alias for NUMERIC, Pg's built-in arbitrary precision and scale decimal type. I'd like to explore the possibility of using hardware decimal floating point support in newer processors, compilers and C libraries to enhance DECIMAL / NUMERIC performance. As near as I can tell, there is no such hardware support. The Intel paper you reference describes a pure-software library, and states A software implementation was deemed sufficient for the foreseeable future. Indeed... it looks like hardware IEEE 754:2008 decimal fp is limited to POWER 6 / POWER 7, which is a bit of a niche area for Pg. Interestingly, some general reading suggests that a lot of mainframe hardware has had decimal number support for a long time due to high adoption by the finance industry. BTW, another relevant Intel paper that goes into the background and history more is http://www.intel.com/standards/floatingpoint.pdf . The Implementation strategies for Decimal Floating-Point Arithmetic section (pp9) is interesting, as are the paper's references. The source code for that library is apparently available under a liberal license. It might be more useful to eyeball what they did and see if we can learn anything towards speeding up the existing variable-precision NUMERIC type. It certainly looks like a fair bit can be learned. For one thing, there's the representation. The standard specifies densely packed decimal (http://en.wikipedia.org/wiki/Densely_packed_decimal) and binary integer decimal (http://en.wikipedia.org/wiki/Binary_Integer_Decimal). One nice characteristic of that is that it converts very efficiently to/from BCD, and is identical to BCD for small values, which would be nice for us. The Intel paper suggests that BID is generally considered superior for a software implementation, though. Indeed. I think you're basically between a rock and a hard place there. It would be very very difficult to shoehorn such types into the existing numeric hierarchy if you wanted any sort of transparency of behavior, I fear. I was afraid of that - I wasn't seeing any ways to do it nicely, but was hoping someone with more experience with the type system would point out something wonderful. On the other hand, I doubt that it's going to work to make the existing numeric type switch to the hardware representation for suitably-constrained columns, because what are you going to do when, say, the result of an addition overflows the hardware width? You can't just throw an error immediately, because you won't know whether the output is supposed to be getting shoved back into a limited-width column or not. That does sound like a hard problem. Even if we're going to cram it back into a small field the user may still want higher precision intermediate values to be used. That these means these types would probably to behave more like smallint/integer/bigint/etc, reporting out-of-range errors rather than silently promoting. That would be hard to fit into the SQL spec's use of a single DECIMAL type unless we just redefined DECIMAL as _Decimal128, which would go down about as well as swallowing tacks. The standard doesn't really seem to allow for multiple different sized decimal sub types, it just has the idea of one DECIMAL and that's what you get, with the implementation taking care of all the messy details. We could take care of those messy details by selecting suitable types (DECIMAL32, DECIMAL64, DECIMAL128, NUMERIC) for different DECIMAL(scale,precision) specifications like we do with FLOAT(precision), but because Pg disregards typmods in intermediate results that'd cause problems with things that currently work, like: regress= SELECT DECIMAL(8,2) '123456.78' * DECIMAL(2,0) '10'; ?column? 1234567.80 (1 row) which currently succeeds despite being out of bounds for the type, since the type is silently converted to unqualified 'numeric'. So the result wouldn't be legal as an input but can be produced as an output: regress= SELECT DECIMAL(8,2) '1234567.80'; ERROR: numeric field overflow DETAIL: A field with precision 8, scale 2 must round to an absolute value less than 10^6. That's pretty ugly, but it's well established behaviour. We can't carry typmods through calculations without huge and expensive re-work from what I've seen raised in prior discussions. I think those were mostly about standards compliance issues with the JDBC driver that our discarding typmods creates. Without that we'd have to use an approach like that used for float(p) ... and then magic up a sane way to deal with the backward compat nightmare. If it weren't for already treating DECIMAL as an alias for NUMERIC I'd be all for just using the FLOAT(p) approach. And on top of that, you have the very strong likelihood that the hardware implementation(s) won't behave exactly like our existing NUMERIC routines --- for
Re: [HACKERS] [PATCH] Exorcise zero-dimensional arrays (Was: Re: Should array_length() Return NULL)
On Jun 11, 2013, at 3:09 PM, Brendan Jurd dire...@gmail.com wrote: There have been attempts to add a cardinality function in the past, as it is required by the SQL spec, but these attempts have stalled when trying to decide how it should handle multidim arrays. Having it return the length of the first dimension is the more spec-compatible way to go, but some folks argued that it should work as ArrayGetNItems, because we don't already have a function for that at the SQL level. Therefore I propose we add cardinality() per the spec, and another function to expose ArrayGetNItems. And that's about where we got to, when the whole discussion was put on a time-out to make room for the beta. I am withdrawing the original zero-D patch in favour of the proposed new functions. If you have an opinion about that, please do chime in. Depending on how that goes I may post a patch implementing my new proposal in the next few days. +1 to this proposal. Modulo function names, perhaps. I don’t much care what they're called, as long as the work as you describe here. Best, David Array Complainer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clean switchover
On 12/06/13 13:15, Stephen Frost wrote: * Fujii Masao (masao.fu...@gmail.com) wrote: The attached patch fixes this problem. It just changes walsender so that it waits for all the outstanding WAL records to be replicated to the standby before closing the replication connection. Seems like a good idea to me.. Rather surprised that we're not doing this already, to be honest. Yeah +1 from here too. This would make clean switchovers for (typically) testing scenarios a lot less complex and resource intensive (rebuilding of the old master as a slave when you know it is ok is despairing on a huge db). On the related note (but not actually to do with this patch), clarifying/expanding the docs about the various methods for standby promotion: 1/ trigger file creation 2/ pg_ctl promote 3/ renaming/removing recovery.conf and the differences between them would be great. For instance I only recently realized that method 3) means the promoted standby does not start a new timeline (incidentally - could this be an option to pg_ctl promote) which is very useful for (again) controlled/clean switchovers. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 06/12/2013 08:42 AM, Andrew Dunstan wrote: If we work by analogy to Postgres' own handling of Unicode escapes, we'll raise an error on any Unicode escape beyond ASCII (not on input for legacy reasons, but on trying to process such datums). I gather that would meet your objection. I could live with that if eager validation on input was the default, but could be disabled by setting (say) compat_lazy_json_validation = on . I don't like the idea of leaving us saddled with weak validation just that's what we've got. It's been an ongoing source of pain as UTF-8 support has improved and I'd really like a way to avoid semi-valid JSON making it into the DB and causing similar problems. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clean switchover
On Wednesday, June 12, 2013 4:23 AM Fujii Masao wrote: Hi, In streaming replication, when we shutdown the master, walsender tries to send all the outstanding WAL records including the shutdown checkpoint record to the standby, and then to exit. This basically means that all the WAL records are fully synced between two servers after the clean shutdown of the master. So, after promoting the standby to new master, we can restart the stopped master as new standby without the need for a fresh backup from new master. But there is one problem: though walsender tries to send all the outstanding WAL records, it doesn't wait for them to be replicated to the standby. IOW, walsender closes the replication connection as soon as it sends WAL records. Then, before receiving all the WAL records, walreceiver can detect the closure of connection and exit. We cannot guarantee that there is no missing WAL in the standby after clean shutdown of the master. In this case, backup from new master is required when restarting the stopped master as new standby. I have experienced this case several times, especially when enabling WAL archiving. The attached patch fixes this problem. It just changes walsender so that it waits for all the outstanding WAL records to be replicated to the standby before closing the replication connection. You may be concerned the case where the standby gets stuck and the walsender keeps waiting for the reply from that standby. In this case, wal_sender_timeout detects such inactive standby and then walsender ends. So even in that case, the shutdown can end. Do you think it can impact time to complete shutdown? After completing shutdown, user will promote standby to master, so if there is delay in shutdown, it can cause delay in switchover. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers