[GENERAL] Rules, Windows and ORDER BY

2012-08-23 Thread Jason Dusek
Hello List,

I have a simple table of keys and values which periodically
receives updated values. It's desirable to keep older values
but, most of the time, we query only for the latest value of a
particular key.

  CREATE TABLE kv
  ( k bytea NOT NULL,
at timestamptz NOT NULL,
realm bytea NOT NULL,
v bytea NOT NULL );
  CREATE INDEX ON kv USING hash(k);
  CREATE INDEX ON kv (t);
  CREATE INDEX ON kv USING hash(realm);

  SELECT * FROM kv WHERE k = $1 AND realm = $2 ORDER BY at DESC LIMIT 1;

It would be nice to encapsulate this common query with a VIEW;
for example:

  CREATE VIEW kv_new AS
SELECT * FROM kv WHERE at =
  ( SELECT at FROM kv AS _
 WHERE _.k = kv.k AND _.realm = kv.realm
 ORDER BY at DESC LIMIT 1 );

I tried partition functions, at first, but they were really very
slow. This view is pretty sprightly but has a more complicated
plan than the original query, which only has a sort followed by
an index scan, and is consequently not as fast. Please find the
plans below my signature.

Ideally, I'd be able to create a rule where the ORDER BY and
LIMIT were simply appended to whatever SELECT was given; but I
am at a loss as to how to do that. Creating a VIEW with the
order and limit just gives me a table with one row in it (of
course).

Is there something better than a sub-select here? I tried using
one with max(at) but it's not noticeably faster. I would be
interested to see how others have approached this kind of log-
-structured storage in Postgres. The window functions make,
alas, no use of indexes.

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B




  EXPLAIN (COSTS FALSE, FORMAT YAML)
SELECT * FROM kv WHERE k = ... AND realm = ... ORDER BY at LIMIT 1;
  -[ RECORD 1 ]-
  QUERY PLAN | - Plan:
 | Node Type: Limit
 | Plans:
 |   - Node Type: Sort
 | Parent Relationship: Outer
 | Sort Key:
 |   - at
 | Plans:
 |   - Node Type: Index Scan
 | Parent Relationship: Outer
 | Scan Direction: NoMovement
 | Index Name: kv_k_idx
 | Relation Name: kv
 | Alias: kv
 | Index Cond: (k = ...)
 | Filter: (realm = ...)


  EXPLAIN (COSTS FALSE, FORMAT YAML)
SELECT * FROM kv_new WHERE k = ... AND realm = ...;
  -[ RECORD 1 ]-
  QUERY PLAN | - Plan:
 | Node Type: Index Scan
 | Scan Direction: NoMovement
 | Index Name: kv_k_idx
 | Relation Name: kv
 | Alias: kv
 | Index Cond: (k = ...)
 | Filter: ((realm = ...) AND (at = (SubPlan 1)))
 | Plans:
 |   - Node Type: Limit
 | Parent Relationship: SubPlan
 | Subplan Name: SubPlan 1
 | Plans:
 |   - Node Type: Sort
 | Parent Relationship: Outer
 | Sort Key:
 |   - _.at
 | Plans:
 |   - Node Type: Index Scan
 | Parent Relationship: Outer
 | Scan Direction: NoMovement
 | Index Name: kv_k_idx
 | Relation Name: kv
 | Alias: _
 | Index Cond: (k = kv.k)
 | Filter: (realm = kv.realm)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Rules, Windows and ORDER BY

2012-08-24 Thread Jason Dusek
2012/8/23 Tom Lane t...@sss.pgh.pa.us:
 Jason Dusek jason.du...@gmail.com writes:
 I have a simple table of keys and values which periodically
 receives updated values. It's desirable to keep older values
 but, most of the time, we query only for the latest value of a
 particular key.

   CREATE TABLE kv
   ( k bytea NOT NULL,
 at timestamptz NOT NULL,
 realm bytea NOT NULL,
 v bytea NOT NULL );
   CREATE INDEX ON kv USING hash(k);
   CREATE INDEX ON kv (t);
   CREATE INDEX ON kv USING hash(realm);

   SELECT * FROM kv WHERE k = $1 AND realm = $2 ORDER BY at DESC LIMIT 1;

 If you want to make that fast, an index on (k,realm,at) would
 help.  Those indexes that you did create are next to useless
 for this, and furthermore hash indexes are quite unsafe for
 production.

Thanks for pointing out the unsafety of hash indexes. I think I
got in the habit of using them for a project with large,
temporary data sets.

Why are the individual indices not useful? The tests that the
query does -- equality on key and realm and ordering on at --
are each supported by indices. Does it have to do with the cost
of loading the three indices?

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] logger table

2012-12-25 Thread Jason Dusek
2012/12/24 Philipp Kraus philipp.kr...@flashpixx.de:
 I need some ideas for creating a PG based logger. I have got a
 job, which can run more than one time. So the PK is at the
 moment jobid  cycle number.  The inserts in this table are in
 parallel with the same username from different host
 (clustering). The user calls in the executable myprint and
 the message will insert into this table, but at the moment I
 don't know a good structure of the table. Each print call can
 be different length, so I think a text field is a good choice,
 but I don't know how can I create a good PK value. IMHO a
 sequence can be create problems that I'm logged in with the
 same user on multiple hosts, a hash key value like SHA1 based
 on the content are not a good choice, because content is not
 unique, so I can get key collisions.  I would like to create
 on each print call a own record in the table, but how can I
 create a good key value and get no problems in parallel
 access. I think there can be more than 1000 inserts each
 second.

 Does anybody can post a good idea?

Why is it neccesry to have a primary key? What is the cycle
number?

For what it is worth, I put all my syslog in PG and have so far
been fine without primary keys. (I keep only an hour there at a
time, though, and it's only a few hundred megs.)

In the past, I have had trouble maintaining a high TPS while
having lots (hundreds) of connected clients; maybe you'll want
to use a connection pool.

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Syncing an application cache with xmin

2013-02-03 Thread Jason Dusek
2013/2/3 Tom Lane t...@sss.pgh.pa.us:
 Jason Dusek jason.du...@gmail.com writes:
 The idea would be, to store information about the last XID in
 the last sync and search for XIDs committed since then upon
 reconnecting for sync. Perhaps `txid_current_snapshot()'
 preserves enough information. Is this a plausible technique?

 Perfectly plausible, and often done in one guise or another.
 You can't expect row XIDs to survive forever --- they'll be
 replaced by FrozenXID after awhile to avoid problems due to
 transaction counter wraparound.  But for delays of a few
 minutes, in a database with an unremarkable transaction rate,
 that's not an issue.

What is the relationship of the epoch-extended XID returned by
`txid_current()' to the XIDs in rows? Do all rows from a
previous epoch always have the FrozenXID?

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] date_trunc to aggregate values?

2013-02-04 Thread Jason Dusek
2013/2/4 Kirk Wythers wythe...@umn.edu:
 I am looking for suggestions on aggregation techniques using a timestamp 
 column. In my case I have tried:

 date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),

 but date_truck only seems to aggregate the timestamp. I thought I could use

 AVG(derived_tsoil_fifteen_min_stacked.value)

 in combination with date_trunk, but I still get 15 minute values, not the 
 hourly average from the four 15 minute records.

 rowid   date_truck
   time2   sitecanopy  plot
 variablenamevalue   avg
 2010-07-07_00:00:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
 00:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   21.06   
 21.054659424
 2010-07-07_00:15:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
 00:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.96   
 20.950844727
 2010-07-07_00:30:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
 00:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.88   
 20.871607666
 2010-07-07_00:45:00_b4warm_a2010-07-07 00:00:00 1   2010-07-07 
 00:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.8
 20.792370605
 2010-07-07_01:00:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
 01:00:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.72   
 20.713133545
 2010-07-07_01:15:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
 01:15:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.64   
 20.633896484
 2010-07-07_01:30:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
 01:30:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.55   
 20.542370605
 2010-07-07_01:45:00_b4warm_a2010-07-07 01:00:00 1   2010-07-07 
 01:45:00 cfc closed  a2  tsoil_sctsoil_avg1_sc   20.47   
 20.463133545

 I was tying to get two records out of this set, with the 'avg column 
 representing the mean of the first and last four of each 15 minute records.

 Suggestions?

Are you using an explicit GROUP BY?

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Analyzing the types of prepared statements

2013-03-16 Thread Jason Dusek
Hello List,

My colleagues and I have some interest in creating an automated
PG-Haskell bridge, that will read the PG catalog for a
particular namespace or namespaces and generate a Haskell module
with functions of appropriate types, creating functions for each
stored procedure and simple tuple types for each table and view.

It would also be nice to scan all the source code for queries
and, ECPG-like, generate code for each query and replace the
query with a reference to it. As long as we can ask Postgres to
inspect a SQL statement for its argument types and return type
(if it has one), it's straightforward to generate the code and
replace the inline SQL statements (which are marked with
quasi-quotations, and thus distinguishable syntactically from
plain strings).

However, it is not clear to me at this juncture how to get the
return type for a statement, given its text. Preparing and
looking it up in pg_prepared_statements will retrieve the
argument types but not the return type. Wrapping the query in a
SQL stored procedure which returns record doesn't cause the
server to infer and store a more specific type.

It is possible to find the return type of a select by replacing
the parameters with NULLs, wrapping the statement in CREATE TEMP
VIEW and examining the type of the view. For statements
involving INSERT RETURNING or DELETE RETURNING, though, this
technique is not viable (no data modifying statements allowed in
WITH clauses for views).

To be able to examine inserts and deletes too, one could drop
all constraints on all tables and then create a temp table that
selects from a CTE with the to-be-analyzed statement in it,
where all '?' have been replaced by NULL. But I'm sure I'm not
the first person who's had this problem and I wonder if there's
a better way. I'd prefer to be able to run the analysis step on
a DB without changing it.

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Analyzing the types of prepared statements

2013-03-19 Thread Jason Dusek
2013/3/19 John DeSoi de...@pgedit.com:
 On Mar 16, 2013, at 8:30 AM, Jason Dusek jason.du...@gmail.com wrote:
 However, it is not clear to me at this juncture how to get the
 return type for a statement, given its text. Preparing and
 looking it up in pg_prepared_statements will retrieve the
 argument types but not the return type. Wrapping the query in a
 SQL stored procedure which returns record doesn't cause the
 server to infer and store a more specific type.

 You might want to take a look at the extended query protocol here:

 http://www.postgresql.org/docs/current/interactive/protocol-flow.html

 If you send a parse (P) messaged followed by describe (D), you can
 get a row description for the return type without executing anything.

Thanks for the tip. I will give that a shot.

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] regclass and format('%I')

2015-03-14 Thread Jason Dusek
On 14 March 2015 at 09:17, David G. Johnston david.g.johns...@gmail.com wrote:
 On Saturday, March 14, 2015, Jason Dusek jason.du...@gmail.com wrote:
 It honestly seems far more reasonable to me that %s and %I should do
 the exact same thing with regclass. My reasoning is as follows:

 ‘%I’ formats a something such that it is a valid identifier,

 regclass is already a valid identifier,

 therefore, do nothing.

 Another line of reasoning:

 If you format with ‘%s’ you are saying: I don’t care whether it’s a
 valid identifier or literal or whatever, just put the string there,

 but when we sub a regclass into a string, we want it to be a valid
 identifier,

 therefore we should write ‘%I’ when subbing it, so as not to confuse
 our readers,

 therefore ‘%I’ should do nothing.


 I agree with the theory but adding type specific logic to format is going to
 be difficult.  The first thing the system does is convert all of the inputs
 to text.  Inside format() everything is text and so it has no way to know
 that the type was regclass and should not be quoted again.

Could it work to add type-specific logic for the cast from `regclass`
to `name`? It would be nice to have something formulaic: always format
identifiers with `%I`, always cast to `name` before formatting.

Kind Regards,
  Jason Dusek


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] regclass and format('%I')

2015-03-19 Thread Jason Dusek
On 15 March 2015 at 08:44, Tom Lane t...@sss.pgh.pa.us wrote:

 David G. Johnston david.g.johns...@gmail.com writes:
  ​IOW, as long as the output string matches: ^(?:{2})*$ I do not see
 how
  it is possible ​for format to lay in a value at %I that is any more
  insecure than the current behavior.  If the input string already matches
  that pattern then it could be output as-is without any additional risk
 and
  with the positive benefit of making this case work as expected.  The
 broken
  case then exists when someone actually intends to name their identifier
  something which then correctly becomes something on output.

 But that's exactly the problem: you just broke a case that used to work.
 format('%I') is not supposed to guess at what the user intends; it is
 supposed to produce a string that, after being passed through identifier
 parsing (dequoting or downcasing), will match the input.  It is not
 format's business to break that contract just because the input has
 already got some double quotes in it.

 An example of where this might be important is if you're trying to
 construct a query with arbitrary column headers in the output.  You
 can do
 format('... AS %I ...', ..., column_label, ...)
 and be confident that the label will be exactly what you've got in
 column_label.  This proposed change would break that for labels that
 happen to already have double-quotes --- but who are we to say that
 that can't have been what you wanted?


I agree with Tom that we shouldn't key off of contents in the string to
determine whether or not to quote. Introducing the behave I describe in an
intuitive way would require some kind of type-specific handling in
format(). I'm not sure what the cost of this is to the project, but David
makes the very reasonable point that imposing the burden of choosing
between `%s` and `%I` opens up the possibility of confusing vulnerabilities.

Kind Regards,
  Jason Dusek


[GENERAL] regclass and format('%I')

2015-03-13 Thread Jason Dusek
Hi All,

The difference in how format handles `regclass` and `name` seems like an
inconsistency:

WITH conversions(casts, format, result) AS (
VALUES (ARRAY['name']::regtype[], '%I', format('%I',
name('select'))),
   (ARRAY['name']::regtype[], '%L', format('%L',
name('select'))),
   (ARRAY['name']::regtype[], '%s', format('%s',
name('select'))),
   (ARRAY['regclass']::regtype[], '%I', format('%I',
regclass('select'))),
   (ARRAY['regclass']::regtype[], '%L', format('%L',
regclass('select'))),
   (ARRAY['regclass']::regtype[], '%s', format('%s',
regclass('select'))),
   (ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
name(regclass('select',
   (ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
name(regclass('select',
   (ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
name(regclass('select'
) SELECT * FROM conversions;
  casts  | format |result
-++--
 {name}  | %I | select
 {name}  | %L | 'select'
 {name}  | %s | select
 {regclass}  | %I | select
 {regclass}  | %L | 'select'
 {regclass}  | %s | select
 {regclass,name} | %I | select
 {regclass,name} | %L | 'select'
 {regclass,name} | %s | select

My assumption is that they both represent valid SQL identifiers, so it stands
to reason that `%I` should result in a valid identifier for both of them (or
neither one).

Kind Regards,
  Jason Dusek


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] regclass and format('%I')

2015-03-14 Thread Jason Dusek
It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.

On 13 March 2015 at 12:42, David G. Johnston david.g.johns...@gmail.com wrote:
 On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek jason.du...@gmail.com wrote:

 Hi All,

 The difference in how format handles `regclass` and `name` seems like an
 inconsistency:

 WITH conversions(casts, format, result) AS (
 VALUES (ARRAY['name']::regtype[], '%I', format('%I',
 name('select'))),
(ARRAY['name']::regtype[], '%L', format('%L',
 name('select'))),
(ARRAY['name']::regtype[], '%s', format('%s',
 name('select'))),
(ARRAY['regclass']::regtype[], '%I', format('%I',
 regclass('select'))),
(ARRAY['regclass']::regtype[], '%L', format('%L',
 regclass('select'))),
(ARRAY['regclass']::regtype[], '%s', format('%s',
 regclass('select'))),
(ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
 name(regclass('select',
(ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
 name(regclass('select',
(ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
 name(regclass('select'
 ) SELECT * FROM conversions;
   casts  | format |result
 -++--
  {name}  | %I | select
  {name}  | %L | 'select'
  {name}  | %s | select
  {regclass}  | %I | select
  {regclass}  | %L | 'select'
  {regclass}  | %s | select
  {regclass,name} | %I | select
  {regclass,name} | %L | 'select'
  {regclass,name} | %s | select

 My assumption is that they both represent valid SQL identifiers, so it
 stands
 to reason that `%I` should result in a valid identifier for both of them
 (or
 neither one).


 All three of the %I results are valid identifiers.

 regclass performs the same conversion that %I performs.  But since the
 output of the regclass conversion is a valid identifier, with double-quotes,
 the %I adds another pair of double-quotes and doubles-up the existing pair
 thus leaving you with 6.

 select is a reserved word and thus can only be used as an identifier if it
 is surrounded in double-quotes.  name() doesn't care (not that it is
 user-documented that I can find) about making its value usable as an
 identifier so when its output goes through %I you get the expected value.

 If you are going to use regclass you want to use %s to insert the result
 into your string; not %I.

 David J.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Joins between foreign tables

2015-06-08 Thread Jason Dusek
Hi Geoff,

Thanks for your swift reply. Breaking the query up like this would seem to
address the problem:

CREATE UNLOGGED TABLE tmp ASSELECT * FROM table_on_server2
 WHERE created_at = date_trunc('day', now());
SELECT email FROM table_on_server1 WHERE id IN (SELECT user_id FROM tmp);

It would stand to reason that the IN list could be pushed down to the
foreign server; but this query’s performance is no better than the direct
join between table_on_server1 and table_on_server2.
Best Regards,

  Jason Dusek
​


[GENERAL] Joins between foreign tables

2015-06-08 Thread Jason Dusek
Dear Postgres,

Consider a query like:

SELECT table_on_server1.email
  FROM table_on_server1 JOIN table_on_server2
ON (table_on_server1.id = table_on_server2.user_id)
 WHERE table_on_server2.created_at = date_trunc('day', now())

One could imagine the plan being something like:

   1.

   Find all the user_ids that result from SELECT user_id FROM
   table_on_server2 WHERE table_on_server2.created_at = date_trunc('day',
   now())
2.

   Pass these IDs to a query executed on server1, pushing them down in a
   WHERE clause.

However, the query actually doesn’t perform at all like that. Which is to
say, if one executes the query in (1) from the console and copy pastes the
resulting IDs into an IN clause in a second query against table_on_server1,
the query returns quickly (milliseconds); whereas if one runs the naive
query at the beginning of this email, it does not return for seconds.

What are things we can do to get good performance for queries like this?
We’ve tied moving the search for IDs into a temp table and a CTE; it
doesn’t seem to make any difference. (Which suggests that joins between one
local and one remote table won’t perform well in general, either.)

Best Regards,

Jason Dusek
​


[GENERAL] There can be only one

2015-10-11 Thread Jason Dusek
Consider a table of providers, for which one is the default. For example,
payment providers:

CREATE TABLE payment_via (
  iduuid PRIMARY KEY,
  provider  text NOT NULL,
  keys  hstore NOT NULL DEFAULT ''
);

Here we store together the name of the provider — medici, paypal — with
access tokens needed to use a certain payment account. How shall we store
which one is the default? Ideally, we’d be able to ensure there is *but one*
default.

CREATE TABLE payment_via (
  iduuid PRIMARY KEY,
  provider  text NOT NULL,
  keys  hstore NOT NULL DEFAULT '',
  is_defaultboolean NOT NULL DEFAULT FALSE
);

How shall we state the constraint? The obvious thing would seem to be:

CREATE TABLE payment_via (
  iduuid PRIMARY KEY,
  provider  text NOT NULL,
  keys  hstore NOT NULL DEFAULT '',
  is_defaultboolean NOT NULL DEFAULT FALSE,
  EXCLUDE (is_default USING AND)
);

However, this is a syntax error. There is always:

CREATE TABLE payment_via (
  iduuid PRIMARY KEY,
  provider  text NOT NULL,
  keys  hstore NOT NULL DEFAULT '',
  is_defaultboolean NOT NULL DEFAULT FALSE,
  EXCLUDE (is_default USING =) WHERE (is_default)
);

but this seems awkward and I was hoping there was some way to use AND as an
operator.
​


[GENERAL] client_min_messages and INFO

2016-06-23 Thread Jason Dusek
Hi List,

I notice that INFO is not included in the list of settable levels for
client_min_messages:


https://www.postgresql.org/docs/9.5/static/runtime-config-logging.html#GUC-CLIENT-MIN-MESSAGES

This seems to be true several versions back so I wonder: what is the
rationale? Is it like the reverse of LOG?

Kind Regards,
  Jason Dusek


[GENERAL] HeapTuple to JSON -- composite_to_json()?

2016-01-21 Thread Jason Dusek
Hi All,

I am working on a logical decoder and would like to convert a heap tuple to
JSON (a HeapTuple obtained from ReorderBufferChange). It is the moral
equivalent of tuple_to_stringinfo in test_decode.c:

http://doxygen.postgresql.org/test__decoding_8c.html#a3986a57a0308de0150ebd45f7734d464

It seems like the easiest and best thing to do is to rely on Postgres’s
native JSON serialization. A HeapTuple is not a Datum; but reading
heaptuple.c they seem to have a close relationship. Do arrays of Datum
become HeapTuple? Is a HeapTuple basically Datum sans type information?
What is the best way to convert a HeapTuple to a Datum?

Best Regards,

  Jason Dusek
​


Re: [GENERAL] HeapTuple to JSON -- composite_to_json()?

2016-01-21 Thread Jason Dusek
heap_copy_tuple_as_datum looks promising…

http://doxygen.postgresql.org/heaptuple_8c.html#abfa9096cd7909cb17a6acfdc7b31b7ad


Re: [GENERAL] Uber migrated from Postgres to MySQL

2016-07-27 Thread Jason Dusek
With regards to write amplification, it makes me think about about OIDs.
Used to be, every row had an OID and that OID persisted across row versions.


https://www.postgresql.org/docs/9.5/static/runtime-config-compatible.html#GUC-DEFAULT-WITH-OIDS

Would reintroducing such a feature address some of Uber's concerns about
multiple indexes? It could, and would do so without the implicit
requirement of a foreign key; but it would also require a fast OID to CTID
mapping.

On Tue, 26 Jul 2016 at 10:40 Guyren Howe  wrote:

> Honestly, I've never heard of anyone doing that. But it sounds like they
> had good reasons.
>
> https://eng.uber.com/mysql-migration/
>
> Thoughts?
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Uber migrated from Postgres to MySQL

2016-07-30 Thread Jason Dusek
On Thu, 28 Jul 2016 at 01:18 Gavin Flower <gavinflo...@archidevsys.co.nz>
wrote:

> On 28/07/16 17:52, Jason Dusek wrote:
> > With regards to write amplification, it makes me think about about
> > OIDs. Used to be, every row had an OID and that OID persisted across
> > row versions.
> >
> >
> https://www.postgresql.org/docs/9.5/static/runtime-config-compatible.html#GUC-DEFAULT-WITH-OIDS
> >
> > Would reintroducing such a feature address some of Uber's concerns
> > about multiple indexes? It could, and would do so without the implicit
> > requirement of a foreign key; but it would also require a fast OID to
> > CTID mapping.
>
> Would it be best to increase OIDs to 64 bits?
>
> Possibly a choice of 32/64 to be decided when the DB is created???
>

Moving to 64 bit OIDs would make the OID->CTID mapping take more memory and
could, consequently, make it slower; but OIDs would seem to be required to
support temporal tables so maybe they should make a comeback?

Kind Regards,
  Jason Dusek


[GENERAL] Large empty table, balanced INSERTs and DELETEs, not being vacuumed

2016-10-21 Thread Jason Dusek
Hi All,

I recently came across an interesting problem relating vacuuming, triggers
and table partitioning.

We have a “virtual table” with a BEFORE trigger that redirects writes to
its child tables. This was all fine well and good until we wanted to use
RETURNING to get the id back — the trigger returns NULL to prevent the row
from being inserted.

We modified the trigger to RETURN NEW and then added an AFTER trigger that
deletes it (WHERE ctid = NEW.ctid). This was working great until a few days
later. The plan for queries against the partitioned table naturally
includes it (there is no constraint preventing writes to it) and this table
was many gigabytes in size with indexes to match! But SELECT count(*) ...
returned 0 rows. The queries became very slow, searching an index full of
deleted rows. Running TRUNCATE ONLY ... fixed the problem.

This is really only a temporary fix, though. We can have a cron job running
in the background running TRUNCATE ONLY ... but this seems like the kind of
thing that auto-vacuuming should have handled for us, before the problem
got “too large”. Are there auto-vacuum settings that we can set, globally
or on the table, to address this situation?

Kind Regards,

  Jason Dusek
​


[GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-11 Thread Jason Dusek
Hi All,

I notice the following oddity:

 =# CREATE TABLE with_pk (i integer PRIMARY KEY);CREATE TABLE

 =# BEGIN;BEGIN
 =# INSERT INTO with_pk VALUES (1) ON CONFLICT DO NOTHING;INSERT 0 1
 =# INSERT INTO with_pk VALUES (1) ON CONFLICT DO NOTHING;INSERT 0 0
 =# END;COMMIT

 =# BEGIN;BEGIN
 =# INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR:  could not serialize access due to concurrent update
 =# END;ROLLBACK

How are these two transactions different?

Kind Regards,

  Jason Dusek
​


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-11 Thread Jason Dusek
SELECT version(),
   (SELECT setting FROM pg_settings WHERE name =
'default_transaction_deferrable') AS default_transaction_deferrable,
   (SELECT setting FROM pg_settings WHERE name =
'default_transaction_isolation') AS default_transaction_isolation;
─[ RECORD 1 
]──┬─
version│ PostgreSQL 9.5.4 on
x86_64-apple-darwin15.6.0, compiled by Apple LLVM version 8.0.0
(clang-800.0.38), 64-bit
default_transaction_deferrable │ on
default_transaction_isolation  │ serializable

​

On Tue, 11 Oct 2016 at 13:00 Kevin Grittner <kgri...@gmail.com> wrote:

> On Tue, Oct 11, 2016 at 2:29 PM, Jason Dusek <jason.du...@gmail.com>
> wrote:
>
> > I notice the following oddity:
>
> >  =# CREATE TABLE with_pk (i integer PRIMARY KEY);
> > CREATE TABLE
>
> >  =# BEGIN;
> > BEGIN
> >  =# INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
> > ERROR:  could not serialize access due to concurrent update
> >  =# END;
> > ROLLBACK
>
> I don't see that on development HEAD.  What version are you
> running?  What is your setting for default_transaction_isolation?
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [GENERAL] Imperative Query Languages

2017-07-11 Thread Jason Dusek
They said it couldn't be done...
dandl  schrieb am Di. 11. Juli 2017 um 06:58:

> From: pgsql-general-ow...@postgresql.org [mailto:
> pgsql-general-ow...@postgresql.org] On Behalf Of Merlin Moncure
>
> > It's probably of broader interest to consider some sort of "more
> relational"
> > language that would, in effect, be "more declarative" as opposed to
> > "more imperative" than SQL.  (I'd not be keen on heading back to
> > CODASYL!!!)
> >
> > The notable example of such would be the "Tutorial D" language
> > attributable to Darwen and Date's "Third Manifesto"
> >
> > https://en.wikipedia.org/wiki/D_(data_language_specification)
> > http://wiki.c2.com/?TutorialDee
> >
> > Unfortunately, the attempts to construct implementations of D have all
> > pretty much remained at the "toy" point, experiments that few beyond
> > the implementors seem to treat as realistic SQL successors.
> >
> > Another option, in principle, would be to consider QUEL, which was
> > what Stonebraker used initially as the query languages for Ingres and
> > Postgres.
> >
> > https://en.wikipedia.org/wiki/QUEL_query_languages
> >
> > None of these options seem to be dominantly better than SQL, and for
> > something to supplant SQL, it would need to be a fair bit better.
>
> I'd like to see a SQL variant (maybe preprocessed) with an algebraic
> syntax.  My biggest gripes with SQL are all the keywords (there are other
> spoken languages than English??) and the unnecessarily irregular syntax.
>
> If you want a comprehensive list of what's wrong with SQL, it's easy
> enough to find. The list is long, but near the top are the failure to
> adhere to the relational model, NULLs, and language design (irregular
> syntax, etc). But SQL is deeply embedded and currently there are no
> competitors in its space. In the academic arena Datalog is preferred, and
> there are solid commercial implementations.
>
> It's easy enough to pre-process your own syntax, and Andl effectively does
> that by generating SQL on Postgres and SQLite. But that doesn't provide
> enough benefits on its own, and displacing SQL from any of the places it's
> currently used is not going to happen any time soon.
>
> Regards
> David M Bennett FACS
>
> Andl - A New Database Language - andl.org
>
>
>
>
>
>
>


Re: [GENERAL] Imperative Query Languages

2017-07-05 Thread Jason Dusek
On Tue, 4 Jul 2017 at 23:01 Tom Lane  wrote:

> I'm pretty sure that that is the model that relational databases (and the
> SQL language in particular) replaced, back in the 70s or so.  Look up
> "network" databases (eg CODASYL) and "hierarchical" DBs (eg IMS) for some
> ancient history here.  Yeah, you can do it like that, but it's seriously
> painful to develop and maintain.  People were more excited about spending
> human effort to save machine cycles forty years ago than they are today.


Network database programming is, indeed, imperative; but as I understand it
there was not much of a planning layer -- the program was the plan. In C#,
one has LINQ; and in Scala and Haskell, monadic comprehensions; and even in
Python one can overload iteration to allow a translation of imperative
syntax to declarative syntax. The goal with these features, is generally to
present a familiar interface to an unfamiliar semantics.

If we imagine network databases have one layer:

  Imperative Plan

And SQL databases have two:

  Declarative Query -> Imperative Plan

It seems reasonable to say, LINQ,  have three:

  Imperative Syntax -> Declarative Query -> Imperative Plan

Fortress is rather the same, since it translates imperative to functional
to assembly.

Kind Regards,
  Jason


[GENERAL] Imperative Query Languages

2017-07-04 Thread Jason Dusek
Hi All,

This more of a general interest than specifically Postgres question. Are
there any “semi-imperative” query languages that have been tried in the
past? I’m imagining a language where something like this:

for employee in employees:
for department in department:
if employee.department == department.department and
   department.name == "infosec":
yield employee.employee, employee.name, employee.location,
employee.favorite_drink

would be planned and executed like this:

SELECT employee.employee, employee.name, employee.location,
employee.favorite_drink
  FROM employee JOIN department USING (department)
 WHERE department.name == "infosec"

The only language I can think of that is vaguely like this is Fortress, in
that it attempts to emulate pseudocode and Fortran very closely while being
fundamentally a dataflow language.

Kind Regards,

  Jason
​


Re: [GENERAL] Imperative Query Languages

2017-07-05 Thread Jason Dusek
On Tue, 4 Jul 2017 at 23:22 Chris Travers  wrote:

> Having done a lot of SQL optimisation stuff  I have doubts that this is
> possible.  The problem is that it is much easier to go from a declarative
> to an imperative plan than it is to go the other way.  In fact sometimes we
> use SQL the way your first code works and then it is often a problem.
>
> For example, consider the difference between an EXISTS and an IN query, or
> between an INNER JOIN and a LATERAL JOIN.  PostgreSQL's optimiser is
> amazing at identifying cases where these are equivalent and planning
> accordingly, but it is extremely easy to get just outside the envelope
> where the optimiser gives up and has to default back to an imperative
> interpretation of these.  Proving that two imperative approaches are
> equivalent is a lot harder than proving that two different imperative
> approaches implement the same declarative request.  In other words, going
> imperative -> declarative strikes me as a far, far harder problem than the
> other way.
>
> Also I have done a little bit of work on Apache Spark and there it is
> extremely important to understand the imperative side of the data flow in
> that case (what is partitioned and what is not).
>

I can not argue these points with you; but Fortress is a good example of
imperative looking code that translates to a functional/declarative core;
as indeed is monadic or applicative code. LINQ is a more recent and
widespread example -- though not encompassing an entire language -- of
something that has an imperative form while being declarative under the
hood. Scala's for comprehensions -- more or less monad comprehensions --are
another.

With regards to Spark, I assume for comprehensions are an important part of
the interface?

Kind Regards,
  Jason

>


Re: [GENERAL] Imperative Query Languages

2017-07-05 Thread Jason Dusek
On Tue, 4 Jul 2017 at 23:57 Chris Travers chris.trav...@gmail.com
 wrote:

I am curious where you see LINQ as starting at an imperative syntax.
>
The imperative integration is thin, I admit — it just the integration with
for loops.

Here's a good case that illustrates the problem I think.  Suppose the
> following is understood imperatively:
>
> FOR x IN RANGE student
> SELECT WHERE x.age < 25
> PROJECT ALL(x), lock_if_possible(x.id)
>
> Now, lock_if_possible has side effects.  If we understand this to be
> imperative, then we have no possibility of turning this into a declarative
> query because we are interested in the side effects.  So you cannot say
> that this is equivalent to the SQL of
>
> SELECT *, lock_if_possible(id)
> FROM student
> WHERE age < 25
>
> The reason is that while the imperative version represents *one* valid
> interpretation of the declarative, there are other interpretations of the
> declarative that are not at all equivalent.  The hoops we have to jump
> through to make this work in an imperative way in SQL are sometimes rather
> amusing.
>
What are some alternative interpretations of this query? Are you referring
to which rows are candidates for locking? Or the order of locking?

Kind Regards,

Jason

​


Re: [GENERAL] Imperative Query Languages

2017-07-05 Thread Jason Dusek
On Wed, 5 Jul 2017 at 14:36 Peter Geoghegan p...@bowt.ie
 wrote:

However, the OP seemed to be describing something that maps imperative
> code to a declarative SQL query or something equivalent, which isn't
> quite the same thing. The declarative nature of SQL feels restrictive
> or at least unfamiliar to many programmers.
>
Yes, that is what I am describing.

SQL is great and I am fully on board with the idea. Everywhere I go, I
promote the greatness of SQL, of the relational model, and of Postgres. I
didn’t write in so much to challenge SQL or pitch navigational databases,
as to ask about examples of “pseudo-imperative” languages.

Fortress is the most noteworthy of these; but anyone who’s seen Haskell’s do
notation realizes there is some promise in the idea: the imperative
structure makes some programs much clearer, even in a pure functional
language.

IMV, what the OP describes wouldn't work well because it would
> superficially *appear* to not be restrictive in the way that some
> people dislike, but actually would be just as restrictive. The only
> way you could write it is by imagining what the SQL it produced looks
> like.

It’s not so much that SQL is restrictive, as that it is confusing in parts.
It is the same with regards to Haskell without do notation — the chained
lambdas boggle the mind. Another — very similar — idea is futures as an
approach to getting around callbacks — they create the appearance of a
linear sequence of execution, when some parts might be parallel; but this
turns out to be a benefit overall. Yet another is using while read x in
shell pipelines — an imperative structure, yet used to model stream
processing.

But this is perhaps neither here nor there: it seems there are no such
languages, no logic languages disguised as imperative languages. There are
only a very few functional languages disguised as imperative languages.
It’s an idea that has interested me for many a year — for the reasons that
Peter cites — and not having seen any language wholly organized in this
way, I thought I’d write to ask you all if you had seen any.

Kind Regards,

Jason
​