from:"Merlin Moncure"

Re: [GENERAL] jsonb

2017-11-14 Thread Merlin Moncure

On Tue, Nov 14, 2017 at 4:30 PM, hmidi slim  wrote:
> I'm trying to use the function to_jsonb and create the name:
> to_jsonb ('{"key1":" ' || 'text1' || '","key2":" ' || 'text2' || '"}');
> But after that I used Objection.js ORM to get data using the query:
> Product.query().where('id',1).then(prod => {console.log(prod)})
> I think that the problem maybe with the usage of to_jsonb function, maybe I
> miss something. But when I fetch the data with the ORM I found that the type
> was a string and not a jsonb

Still not quite following. My advice would be to:

#1) work out the SQL you want the database to be running and verify
the results are correct

#2) figure out out to get the ORM to send that SQL

If you need help figuring out that SQL the ORM is actually running,
try turning on statement logging in postgresql.conf and watching the
log.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] jsonb

2017-11-14 Thread Merlin Moncure

On Tue, Nov 14, 2017 at 3:32 PM, hmidi slim  wrote:
> I have a column name of type 'jsonb' on my table named product. The format
> of the column:
> name: {"key1": "text1", "key2": "text2"}
>
> When I make a query to fetch data from the table I got this format:
> name: '{"key1": "text1", "key2": "text2"}'
>
> Why does postgresql returns the name such as string type and not jsonb? is
> it a bug or is there something else to add?

not quite following.  Can you paste the query you are trying to
execute along with the results vs. expectation?  thanks

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Migrating money column from MS SQL Server to Postgres

2017-11-09 Thread Merlin Moncure

On Thu, Nov 9, 2017 at 8:22 AM, Adam Brusselback
 wrote:
>> Since you are migrating data into a staging table in PostgreSQL, you may set
>> the field data type as TEXT for each field where you have noticed or
>> anticipate issues.
>> Then after population perform the datatype transformation query on the given
>> fields to determine the actual field value that could not be gracefully
>> transformed.
>
> This is the approach I have come to as the most successful for data 
> migrations.
>
> I will use tools like Kettle / Talend to get data into a staging table
> with every column as text, then use SQL to migrate that to a properly
> typed table.  Works much better than trying to work within the
> constraints of these tools.

YES

I call the approach 'ELT', (Extract, Load, Trasform).  You are much
better off writing transformations in SQL than inside of an ETL tool.
This is a perfect example of why.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] idle in transaction, why

2017-11-06 Thread Merlin Moncure

On Mon, Nov 6, 2017 at 2:50 PM, Rob Sargent  wrote:
>
>
> On 11/06/2017 01:41 PM, Tom Lane wrote:
>>
>> Rob Sargent  writes:
>>>
>>>idle_in_transaction_session_timeout | 0   | default |
>>> || A value of 0 turns off the timeout. | user
>>
>> Meh.  I think we're barking up the wrong tree anyway: so far as I can
>> find, there is no error message reading 'idle transaction timeout'
>> in the existing PG sources (and I sure hope no committer would have
>> thought that such an ambiguous message text was satisfactory).
>> So I think your error is coming from client-side or third-party code.
>> What other moving parts have you got in there?
>>
>> regards, tom lane
>
> The most likely culprit is JOOQ, which I chose as a learning experience
> (normally I use ORM tools).  But that said, I just ran the same data into my
> test env, (postgres 10.0 (real) on centos 6.9, ubuntu client) and all went
> swimmingly.  It's a sizable payload (several batches of over 100K items,
> deserialized from json) and takes 5 minutes to save.
>
> I was hoping to blame the virt or the beta.  Not a good time to start doubt
> JOOQ

I can't speak to JOOQ (who are fantastic postgres supporters BTW), but
in the java world this typically comes from one of two things:

1) you have long running in-transaction process that has very heavy
computation between sql statements.  this is a rare case
--or--
2) you are connecting pooling and the app sent a connection back into
the pool without having a transaction committed.

"2" is a common and dangerous bug.  It can happen due to bug in
application code (most likely), the jdbc wrapping library code (less
likely), or the connection pooler itself if you're using one.  A
typical cause of application side problems is manual transaction
management and some uncaught exception paths where errors (say, a
duplicate key error).  So investigate causes like that first
(database errors in the database log might be a helpful clue) and go
from there.   If the problem is within JOOQ, you ought to take it up
with them, which I encourage you to do, since I consider JOOQ to be a
wonderful treatment of SQL integration from the java perspective.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query not scaling

2017-10-27 Thread Merlin Moncure

On Thu, Oct 26, 2017 at 10:01 AM, Tom Lane  wrote:
> Laurenz Albe  writes:
>> Also, to have PostgreSQL inline the function, which would be good
>> for performance, it should be declared IMMUTABLE.
>
> Actually, if you hope to have a SQL function be inlined, it's better
> not to decorate it at all --- not with IMMUTABLE, and not with STRICT
> either.  Both of those restrict the parser's ability to inline unless
> it can prove the contained expression is equally immutable/strict.
> With the default attributes of volatile/not strict, there's nothing
> to prove.

This is extremely obnoxious.  Is it possible to raise a warning on
function creation?

> (In any case, it's usually easy enough to tell from EXPLAIN output
> whether inlining has happened.)

No it isn't.  The explain syntax is arcane and inlining as a general
concept is only very indirectly expressed.  I really think we ought to
do better here; I was not able to find any treatment of inlining given
in the 'Performance Tips' or the 'Functions and Operators' section, or
anywhere really (except the wiki).  This is really a disservice to the
users, I think.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Merlin Moncure

On Mon, Oct 9, 2017 at 6:12 PM, Christophe Pettus  wrote:
>
>> On Oct 9, 2017, at 14:29, Tom Lane  wrote:
>> Hmm.  Creating or dropping a temp table does take AccessExclusiveLock,
>> just as it does for a non-temp table.  In principle we'd not have to
>> transmit those locks to standbys, but I doubt that the WAL code has
>> enough knowledge to filter them out.  So a lot of temp tables and
>> a lot of separate subtransactions could be a nasty combination.
>
> The problem indeed appear to be a very large number of subtransactions, each 
> one creating a temp table, inside a single transaction.  It's made worse by 
> one of those transactions finally getting replayed on the secondary, only to 
> have another one come in right behind it...

FYI, this is pathological coding; it can cause other nastiness such as
exhausting shared memory.  Consider refactoring the code to using a
permanent table (if access is concurrent, you can do tricks like using
'pid' in the key to subdivide the data).

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

2017-09-22 Thread Merlin Moncure

On Thu, Sep 21, 2017 at 12:02 PM, Frank Millman <fr...@chagford.com> wrote:
> On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
>>
>> > On Thu, Sep 21, 2017 at 1:51 AM, Frank Millman <fr...@chagford.com>
>> > wrote:
>> >
>>
>> I did not get any response to this, but I am still persevering, and feel
>>
>> that I am getting closer. Instead of waiting 26 minutes for a result, I
>>
>> realise that I can learn a lot by using EXPLAIN. This is what I have found
>>
>> out.
>>
>
>
>>
> Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
> query?
>>
>
>
> I’m working on it, but my machine is playing up and it is getting late, so I
> will try again tomorrow.
>
> A passing comment – the 26 minute query is more complex, so will need some
> explaining (no pun intended). I was hoping that my simplified example would
> illustrate what I think is the problem.
>
> Anyway, here is the query -
>
> SELECT q.cust_row_id,
> SUM(CASE WHEN q.tran_date > '2015-08-31' THEN q.balance ELSE 0 END
> ) AS "balance_curr AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-08-31' AND q.tran_date > '2015-07-31'
> THEN q.balance ELSE 0 END
> ) AS "balance_30 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-07-31' AND q.tran_date > '2015-06-30'
> THEN q.balance ELSE 0 END
> ) AS "balance_60 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-06-30' AND q.tran_date > '2015-05-31'
> THEN q.balance ELSE 0 END
> ) AS "balance_90 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-05-31' THEN q.balance ELSE 0 END
> ) AS "balance_120 AS [DECTEXT]"
> FROM
> (SELECT
> due_trans.cust_row_id,
> due_trans.tran_date,
> trans_due.amount_cust +
> COALESCE((SELECT SUM(trans_alloc.alloc_cust+trans_alloc.disc_cust)
> FROM prop.ar_trans_alloc trans_alloc
> LEFT JOIN prop.ar_trans alloc_trans ON
> alloc_trans.tran_type = trans_alloc.tran_type
> AND alloc_trans.tran_row_id = trans_alloc.tran_row_id
> WHERE trans_alloc.due_row_id = trans_due.row_id
> AND alloc_trans.tran_date <= '2015-09-30'
> ), 0)
> AS balance
> FROM prop.ar_trans_due trans_due
> LEFT JOIN prop.ar_trans due_trans ON
> due_trans.tran_type = trans_due.tran_type
> AND due_trans.tran_row_id = trans_due.tran_row_id
> WHERE due_trans.tran_date <= '2015-09-30'
> ) AS q
> GROUP BY q.cust_row_id
> ORDER BY q.cust_row_id;


What is the performance with this portion simplified out?

> COALESCE((SELECT SUM(trans_alloc.alloc_cust+trans_alloc.disc_cust)
> FROM prop.ar_trans_alloc trans_alloc
> LEFT JOIN prop.ar_trans alloc_trans ON
> alloc_trans.tran_type = trans_alloc.tran_type
> AND alloc_trans.tran_row_id = trans_alloc.tran_row_id
> WHERE trans_alloc.due_row_id = trans_due.row_id
> AND alloc_trans.tran_date <= '2015-09-30'
> ), 0)

Change that to just '0' and rerun the query.  If timings are good, I
think we want to explore converting this to LATERAL type join.  I
think (but am not sure) this is defeating the optimizer.   Also, is
this the actual query you want to run quickly?  You are not filtering
on cust_row_id?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

2017-09-21 Thread Merlin Moncure

On Thu, Sep 21, 2017 at 12:02 PM, Frank Millman <fr...@chagford.com> wrote:
> On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
>>
>> > On Thu, Sep 21, 2017 at 1:51 AM, Frank Millman <fr...@chagford.com>
>> > wrote:
>> >
>>
>> I did not get any response to this, but I am still persevering, and feel
>>
>> that I am getting closer. Instead of waiting 26 minutes for a result, I
>>
>> realise that I can learn a lot by using EXPLAIN. This is what I have found
>>
>> out.
>>
>
>
>>
> Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
> query?
>>
>
>
> I’m working on it, but my machine is playing up and it is getting late, so I
> will try again tomorrow.
>
> A passing comment – the 26 minute query is more complex, so will need some
> explaining (no pun intended). I was hoping that my simplified example would
> illustrate what I think is the problem.
>
> Anyway, here is the query -
>
> SELECT q.cust_row_id,
> SUM(CASE WHEN q.tran_date > '2015-08-31' THEN q.balance ELSE 0 END
> ) AS "balance_curr AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-08-31' AND q.tran_date > '2015-07-31'
> THEN q.balance ELSE 0 END
> ) AS "balance_30 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-07-31' AND q.tran_date > '2015-06-30'
> THEN q.balance ELSE 0 END
> ) AS "balance_60 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-06-30' AND q.tran_date > '2015-05-31'
> THEN q.balance ELSE 0 END
> ) AS "balance_90 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-05-31' THEN q.balance ELSE 0 END
> ) AS "balance_120 AS [DECTEXT]"
> FROM
> (SELECT
> due_trans.cust_row_id,
> due_trans.tran_date,
> trans_due.amount_cust +
> COALESCE((SELECT SUM(trans_alloc.alloc_cust+trans_alloc.disc_cust)
> FROM prop.ar_trans_alloc trans_alloc
> LEFT JOIN prop.ar_trans alloc_trans ON
> alloc_trans.tran_type = trans_alloc.tran_type
> AND alloc_trans.tran_row_id = trans_alloc.tran_row_id
> WHERE trans_alloc.due_row_id = trans_due.row_id
> AND alloc_trans.tran_date <= '2015-09-30'
> ), 0)
> AS balance
> FROM prop.ar_trans_due trans_due
> LEFT JOIN prop.ar_trans due_trans ON
> due_trans.tran_type = trans_due.tran_type
> AND due_trans.tran_row_id = trans_due.tran_row_id
> WHERE due_trans.tran_date <= '2015-09-30'
> ) AS q
> GROUP BY q.cust_row_id
> ORDER BY q.cust_row_id;
>
> I will report back with the EXPLAIN ANALYSE tomorrow.

Thank you.  It's a minor point, but take a look at the FILTER syntax here:
https://www.postgresql.org/docs/9.4/static/sql-expressions.html#SYNTAX-AGGREGATES

For example,
> SUM(CASE WHEN q.tran_date > '2015-08-31' THEN q.balance ELSE 0 END
> ) AS "balance_curr AS [DECTEXT]",

Could be rewritten as:
SUM(q.balance) FILTER (WHERE  q.tran_date > '2015-08-31')
  AS "balance_curr AS [DECTEXT]",

Besides being more terse and clear, FILTER expressions in my
experience tend to be somewhat faster than aggregations over CASE
statements.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

2017-09-21 Thread Merlin Moncure

On Thu, Sep 21, 2017 at 1:51 AM, Frank Millman  wrote:
> On 2017-09-18 Frank Millman wrote:
>>
>> Here are the timings for running the query on identical data sets using
>> Postgresql, Sql Server, and Sqlite3 -
>>
>> PostgreSQL -
>> Method 1 - 0.28 sec
>> Method 2 – 1607 sec, or 26 minutes
>>
>> Sql Server -
>> Method 1 – 0.33 sec
>> Method 2 – 1.8 sec
>>
>> Sqlite3 -
>> Method 1 – 0.15 sec
>> Method 2 – 1.0 sec
>>
>> It seems that Sql Server and Sqlite3 are able to analyse the ‘join’, and
>> execute an indexed read against the underlying physical tables.
>>
>
> I did not get any response to this, but I am still persevering, and feel
> that I am getting closer. Instead of waiting 26 minutes for a result, I
> realise that I can learn a lot by using EXPLAIN. This is what I have found
> out.

Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute query?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

2017-09-18 Thread Merlin Moncure

On Thu, Sep 14, 2017 at 8:17 AM, Pavel Stehule  wrote:
> 2017-09-14 15:09 GMT+02:00 Pavel Stehule :
>>
>>
>>
>> 2017-09-14 14:59 GMT+02:00 Frank Millman :
>>>
>>> Pavel Stehule wrote:
>>>
>>> 2017-09-14 10:14 GMT+02:00 Frank Millman :

 Hi all

 This is a follow-up to a recent question I posted regarding a slow
 query. I thought that the slowness was caused by the number of JOINs in the
 query, but with your assistance I have found the true reason. I said in the
 previous thread that the question had become academic, but now that I
 understand things better, it is no longer academic as it casts doubt on my
 whole approach.

 I have split my AR transaction table into three physical tables –
 ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some
 point, such as ar_tran_jnl.

 I then create a VIEW to view all transactions combined. The view is
 created like this -

 CREATE VIEW ar_trans AS
   SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ...
 FROM ar_tran_inv WHERE posted = ‘1’
   UNION ALL
   SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ...
 FROM ar_tran_crn WHERE posted = ‘1’
   UNION ALL
   SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ...
 FROM ar_tran_rec WHERE posted = ‘1’

 I have another table called ‘ar_trans_due’, to keep track of outstanding
 transactions. All of the three transaction types generate entries into this
 table. To identify the source of the transaction, I have created columns in
 ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row
 into ‘ar_tran_inv’, I invoke this -

   INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES
 (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction
 types. It is handled by a Python program, and it all happens within a
 transaction.

 When I view a row in ar_trans_due, I want to retrieve data from the
 source transaction, so I have this -

   SELECT * FROM ar_trans_due a
   LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id =
 a.tran_row_id

 I understand that PostgreSQL must somehow follow a path from the view
 ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would
 execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id =
 a.tran_row_id AND posted = ‘1’.

 If this was the case, it would be an indexed read, and very fast.
 Instead, according to EXPLAIN, it performs a sequential scan of the
 ‘ar_tran_inv’ table.

 It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it
 uses a Bitmap Heap Scan on those. I assume that is because the tables are
 currently empty.

 Is this analysis correct?
>>>
>>>
>>> please, send EXPLAIN ANALYZE result :)

>>>
>>>
>>> I tried to reduce this to its simplest form.
>>>
>>> Here is a SQL statement -
>>>
>>> SELECT *
>>> FROM ccc.ar_trans_due a
>>> LEFT JOIN ccc.ar_trans b ON
>>> b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
>>> WHERE a.row_id = 1
>>>
>>> ar_trans_due is a physical table, ar_trans is a view.
>>>
>>> It takes about 28ms. Here is the explain -
>>> https://explain.depesz.com/s/8YY
>>>
>>>
>
>
> The PostgreSQL cannot to push join - in slow case, the UNIONS should be done
> first - and it requires full scan ar_tran_inv - used filter (posted AND
> (deleted_id = 0) is not too effective - maybe some composite or partial
> index helps.

In my testing JOINS can push through UNION ALL.   Why do we need to
materialize  union first?  What version is this?

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] looking for a globally unique row ID

2017-09-14 Thread Merlin Moncure

On Thu, Sep 14, 2017 at 12:11 PM, Rafal Pietrak <ra...@ztk-rp.eu> wrote:
> W dniu 14.09.2017 o 15:54, Merlin Moncure pisze:
>> On Thu, Sep 14, 2017 at 2:45 AM, Rafal Pietrak <ra...@ztk-rp.eu> wrote:
>>> Hello everybody,
>>>
>>> Can anybody help me find a way to implement an ID which:
>>>
>>> 1. guarantees being unique across multiple tables.
>>>
>>> 2. guarantees its uniqueness not only during INSERT, but also during the
>>> lifetime of the database/application (e.i. during future UPDATES).
>>>
>>> 3. guarantees persistence of value across database backup/restore/upgrade.
>>>
>>> an obvious candidate - a single SERIAL() (same serial) used in every
>>> table that needs that ID does not guarantee (2).
>>
>> A shared sequence meets all of those requirements.  I tend to prefer
>
> Not really.
>
> As I said, I'm not looking for performance or "fair probability" of
> planetary-wide uniqueness.
>
> My main objective is the "guarantee". Which I've tried to indicate
> referring to "future UPDATEs".
>
> What I mean here is functionality similar to "primary key", or "unique
> constraint". Whenever somebody (application, like faulty application
> IMPORTANT!) tries to INSERT or UPDATE a not unique value there (which in
> fact could possibly be generated earlier by UUID algorithms, or even a
> sequence), if that value is among table that uses that (misterious)
> "global primary key"; that application just fails the transaction like
> any other "not unique" constraint failing.

hm, I don't understand you.  Are you trying to defend against forged
or erroneous data?  With both sequences and GUIDs, the assumption is
that the application will use the generated id and send it back to the
database.  With sequences, the database *guarantees* that the
identifier is unique with no exceptions; there never will be a unique
value.  Can you give a hypothetical example of how you think they
wouldn't work?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] looking for a globally unique row ID

2017-09-14 Thread Merlin Moncure

On Thu, Sep 14, 2017 at 2:45 AM, Rafal Pietrak  wrote:
> Hello everybody,
>
> Can anybody help me find a way to implement an ID which:
>
> 1. guarantees being unique across multiple tables.
>
> 2. guarantees its uniqueness not only during INSERT, but also during the
> lifetime of the database/application (e.i. during future UPDATES).
>
> 3. guarantees persistence of value across database backup/restore/upgrade.
>
> an obvious candidate - a single SERIAL() (same serial) used in every
> table that needs that ID does not guarantee (2).

A shared sequence meets all of those requirements.  I tend to prefer
this over GUID.   GUIDs are twice the size, and randomized which has
some unpleasant performance characteristics.   The advantage they
bring is being able to set up multiple generation points (say, across
multiple database servers) without fear of conflict.

Sequence generation is very fast so there is no performance related
argument not to use the approach.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] B-tree index on a VARCHAR(4000) column

2017-09-10 Thread Merlin Moncure

On Friday, September 8, 2017, John Turner  wrote:

>
>
> On Fri, Sep 8, 2017 at 6:57 AM Tom Lane  > wrote:
>
>> Ron Johnson > > writes:
>> > Based on LENGTH(offending_column), none of the values are more than 144
>> > bytes in this 44.2M row table.  Even though VARCHAR is, by definition,
>> > variable length, are there any internal design issues which would make
>> > things more efficient if it were dropped to, for example, VARCHAR(256)?
>>
>> No.
>>
>> So the declarative column length has no bearing on memory grants during
> plan generation/execution?
>

Nope.  Memory usage is proportional to the size of the string, not the
maximum length for varchar.  Maximum length is a constraint.

merlin

Re: [GENERAL] B-tree index on a VARCHAR(4000) column

2017-09-08 Thread Merlin Moncure

On Thu, Sep 7, 2017 at 10:48 PM, Ron Johnson  wrote:
> Hi,
>
> v 9.2.7
>
> Based on LENGTH(offending_column), none of the values are more than 144
> bytes in this 44.2M row table.  Even though VARCHAR is, by definition,
> variable length, are there any internal design issues which would make
> things more efficient if it were dropped to, for example, VARCHAR(256)?
>
> (I don't have access to the source code or to development boxes, so can't
> just test this on my own.)

Just use TEXT :-).   Realizing that obsessing about column lengths was
a giant waste of time and energy for zero useful benefit that I've
ever observed was a transformational moment for me.  Also, please
apply bugfix upgrades :-).

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Issue with json_agg() and ordering

2017-09-01 Thread Merlin Moncure

On Fri, Sep 1, 2017 at 6:22 AM, Charles Clavadetscher
 wrote:
> Hello
>
>> -Original Message-
>> From: pgsql-general-ow...@postgresql.org 
>> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Bob Jones
>> Sent: Freitag, 1. September 2017 10:12
>> To: pgsql-general 
>> Subject: [GENERAL] Issue with json_agg() and ordering
>>
>> Hi,
>>
>>
>> Could anyone give me a few pointers as to how I might resolve the following :
>>
>> select json_agg(my_table) from (my_table) where foo='test' and bar='f'
>> order by last_name asc, first_name asc;
>>
>> ERROR:  column "my_table.last_name" must appear in the GROUP BY clause or be 
>> used in an aggregate function LINE 1:
>> ...foo='f' order by last_name ...
>
> I guess that the order by should be in the aggregation.
>
> SELECT json_agg(a.* ORDER BY a.last_name, a.last_year DESC)
> FROM my_table a;

yes.  however, you would say, json_agg(a... not 'a.*').  The .*
notation only works in certain contexts, and is transformed at parse
time to, a.col1, a.col2, a.col3...  which would not work inside an
aggregation function which can only handle a single column or record.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

2017-08-31 Thread Merlin Moncure

On Wed, Aug 30, 2017 at 9:03 PM, 유상지  wrote:

> I want to get help with Postgresql.
>
> I investigated that Postgresql could be rather fast in an environment
> using a secondary index. but It came up with different results on benckmark.
>
> The database I compared was mariadb, and the benchmark tool was sysbench
> 1.0.8 with the postgresql driver.
>
> Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 GB,
> Harddisk: 40 GB, Mariadb (v10.3), PostgreSQL (v9.6.4)
>
mysql and other systems (for example sql server) use a technique where the
table is automatically clustered around an index-- generally the primary
key.  This technique has some tradeoffs; the main upside is that lookups on
the pkey are somewhat faster whereas lookups on any other index are
somewhat slower and insertions can be slower in certain cases (especially
if guids are the pkey).  I would call this technique 'index organized
table'.  The technique exploits the persistent organization so that the
index is implied and does not have to be kept separate from the heap (the
main table data storage).

postgres 'cluster' command currently is a one time pass over the table that
organizes the table physically in index order but does not maintain the
table in that order nor does it exploit the ordering to eliminate the
primary key index in the manner that other systems do.   From a postgres
point of view, the main advantage is that scans (not single record lookups)
over the key will be sequential physical reads and will tend to have to
read less physical pages since adjacent key records logically will also be
adjacent physically.

For my part, I generally prefer the postgres style of organization for most
workloads, particularly for the surrogate key pattern. I would definitely
like to have the option of having the indexed organized style however.
It's definitely possible to tease out the tradeoffs in synthetic
benchmarking but in the gross aggregate I suspect (but can't obviously
prove) the technique is a loser since as database models mature the kinds
of ways tables are indexed looked up and joined tends to proliferate.

merlin

Re: [GENERAL] Would you add a --dry-run to pg_restore?

2017-08-03 Thread Merlin Moncure

On Wed, Aug 2, 2017 at 11:49 AM, Tom Lane  wrote:
> Edmundo Robles  writes:
>> I mean,  to   verify the integrity of backup  i do:
>> gunzip -c backup_yesterday.gz | pg_restore  -d my_database  && echo
>> "backup_yesterday is OK"
>
>> but my_database's size, uncompresed,  is too big  more than 15G  and
>> sometimes  i  have  no space  to restore it, so always i must declutter my
>>  disk first.
>
>> Will be great to have a dry  run option, because   the time  to verify
>>  reduces a lot and  will save space on disk, because just  execute  with no
>> write to disk.
>
> What do you imagine a dry run option would do?
>
> If you just want to see if the file contains obvious corruption,
> you could do
>
> pg_restore file >/dev/null
>
> and see if it prints any complaints on stderr.  If you want to have
> confidence that the file would actually restore (and that there aren't
> e.g. unique-index violations or foreign-key violations in the data),
> I could imagine a mode where pg_restore wraps its output in "begin" and
> "rollback".  But that's not going to save any disk space, or time,
> compared to doing a normal restore into a scratch database.
>
> I can't think of any intermediate levels of verification that wouldn't
> involve a huge amount of work to implement ... and they'd be unlikely
> to catch interesting problems in practice.  For instance, I doubt that
> syntax-checking but not executing the SQL coming out of pg_restore would
> be worth the trouble.  If an archive is corrupt enough that it contains
> bad SQL, it probably has problems that pg_restore would notice anyway.
> Most of the restore failures that we hear about in practice would not be
> detectable without actually executing the commands, because they involve
> problems like issuing commands in the wrong order.

The vast majority of my restore issues are dependency problems (for
example, postgis extension not being present).  A distant second place
would be pg_restore's inability to do things in the proper order or
gaps in the dump feature itself (for example, a cast between two built
in types, at least back in the day).

A good reasonable test for all of those cases with the current tools
is to do a schema only restore (which should not take long in most
cases).   If you get past that step, there is an exceptionally high
probability that the restore will succeed sans some controllable
factors like running out of space.

There are some rare known considerations that could a data load to
fail.  For example, a unique index on floating point can dump but not
load if two binary differentiated values render to the same string.
I've never seen this in practice however.  So I'd argue to just use
that (schema only) feature for pre-load verification if you're
paranoid.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Shared Constants in PLPGSQL

2017-08-01 Thread Merlin Moncure

On Tue, Aug 1, 2017 at 8:29 AM, Tim Uckun  wrote:
> In my case I don't expect these constants to be changed on a regular basis.
> They will be set just once and that's it. I was thinking it would be just as
> easy to set them in a proc as it would be to set them in a table. By putting
> them in an immutable proc I can hopefully save a couple of compute cycles.

Sure. The point is, by having a proc return a table based composite
type, you can simplify changes down the line.   Adding a new setting
can be done via ALTER.  Changing a setting (should it become
necessary) can be done with an UPDATE.   The immutable wrapping
function does eliminate some fetches and I would generally write that
wrapper.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Shared Constants in PLPGSQL

2017-08-01 Thread Merlin Moncure

On Tue, Aug 1, 2017 at 8:04 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
> re-evaluate.  The main advantage over your approach is that you don't
> have to modify multiple things every time you add a new config values;
> just add a column and replace the function.

This can be automated too, via event triggers:
https://www.postgresql.org/docs/9.3/static/sql-createeventtrigger.html

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Shared Constants in PLPGSQL

2017-08-01 Thread Merlin Moncure

On Tue, Aug 1, 2017 at 4:56 AM, Tim Uckun  wrote:
> What's the best way to deal with global constants in PLPGSQL. Currently I am
> putting them in a function with out parameters and then calling that
> function from every other function that needs them like this.
>
> CREATE OR REPLACE FUNCTION hashids.constants(
>  OUT min_alphabet_length integer,
> OUT sep_div numeric,
> OUT guard_div numeric,
> OUT default_steps text,
> OUT default_alphabet text,
> OUT salt text)
>
> I am presuming that if I set this function as immutable the calls to this
> function will be cached and will not incur much overhead.

Yes.  Couple things I'd suggest changing.
1. Make a control table, say, hashids.config and put your data there.

CREATE TABLE hashids.config
(
  min_alphabet_length integer,
  ...
);

-- one record only, please:
CREATE UNIQUE INDEX ON hashids.config((1));

2. let's change your function to return the table type!
CREATE OR REPLACE FUNCTION hashids.constants()
  RETURNS hashids.config AS
$$
  SELECT * FROM hashids.config;
$$ LANGUAGE SQL IMMUTABLE;

...here we're breaking a rule.  This is technically not an immutable
query.  However, if you are calling this all over the place in
plpgsql, you can save a few cycles since operations of the form of:

DECLARE
  settings hashid.config;
BEGIN
  settings := hashids.constants();
  ...

...will be calculated at plan time and not re-evaluated every time the
function is called.  The savings here are pretty minor but I've
employed this trick many times because there's very little downside to
doing so.  You do have to remember to recreate the constants()
function every time you change a setting in order to force the plan to
re-evaluate.  The main advantage over your approach is that you don't
have to modify multiple things every time you add a new config values;
just add a column and replace the function.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Imperative Query Languages

2017-07-11 Thread Merlin Moncure

On Mon, Jul 10, 2017 at 4:26 PM, Christopher Browne  wrote:
> On 5 July 2017 at 01:22, Jason Dusek  wrote:
>> Hi All,
>>
>> This more of a general interest than specifically Postgres question. Are
>> there any “semi-imperative” query languages that have been tried in the
>> past? I’m imagining a language where something like this:
>>
>> for employee in employees:
>> for department in department:
>> if employee.department == department.department and
>>department.name == "infosec":
>> yield employee.employee, employee.name, employee.location,
>> employee.favorite_drink
>>
>> would be planned and executed like this:
>>
>> SELECT employee.employee, employee.name, employee.location,
>> employee.favorite_drink
>>   FROM employee JOIN department USING (department)
>>  WHERE department.name == "infosec"
>>
>> The only language I can think of that is vaguely like this is Fortress, in
>> that it attempts to emulate pseudocode and Fortran very closely while being
>> fundamentally a dataflow language.
>
> It's probably of broader interest to consider some sort of "more relational"
> language that would, in effect, be "more declarative" as opposed to
> "more imperative" than SQL.  (I'd not be keen on heading back to
> CODASYL!!!)
>
> The notable example of such would be the "Tutorial D" language
> attributable to Darwen and Date's "Third Manifesto"
>
> https://en.wikipedia.org/wiki/D_(data_language_specification)
> http://wiki.c2.com/?TutorialDee
>
> Unfortunately, the attempts to construct implementations of D
> have all pretty much remained at the "toy" point, experiments
> that few beyond the implementors seem to treat as realistic
> SQL successors.
>
> Another option, in principle, would be to consider QUEL, which
> was what Stonebraker used initially as the query languages for
> Ingres and Postgres.
>
> https://en.wikipedia.org/wiki/QUEL_query_languages
>
> None of these options seem to be dominantly better than SQL,
> and for something to supplant SQL, it would need to be a
> fair bit better.

I'd like to see a SQL variant (maybe preprocessed) with an algebraic
syntax.  My biggest gripes with SQL are all the keywords (there are
other spoken languages than English??) and the unnecessarily irregular
syntax.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Imperative Query Languages

2017-07-05 Thread Merlin Moncure

On Wed, Jul 5, 2017 at 12:22 AM, Jason Dusek  wrote:
> Hi All,
>
> This more of a general interest than specifically Postgres question. Are
> there any “semi-imperative” query languages that have been tried in the
> past? I’m imagining a language where something like this:

huh.  Somewhat snarky answer, all of them? :-).  Most languages are
imperative including just about all the ones that are popular in "the
enterprise" (minus SQL).   You can for example code that looks
remarkably like that in pl/pgsql.  Since postgres is a SQL server, it
would have to get compiled down to boring SQL statements but it's
generally understood (at least by me) that this is an inefficient way
to write code.

Downthread, Tom mentions CODASYL, etc, but could have mentioned the
big daddy, ISAM, and all it's various flavors.  Virtually all business
code was written using it (and much of it still is) leading into the
SQL era.   Code generally looked exactly like your example, except it
was much more stupid looking being written in (generally) COBOL, and
added in error handling, which is where the technique tends to break
down.  SQL came about because some particularly smart people realized
that programmers were writing the same boiler plate code over and over
again and that perhaps access to data could be generalized and focused
down to the real problem being solved.   This fortunately came about
before "enterprise developers" and "enterprise tool vendors" were as
organized as they are today and so was able to germinate into
something incredibly useful...

This revolution in programming was IMNSHO the single greatest
innovation in computer science; so much so that legions of the
unenlightened have been tirelessly working (hibernate) to eliminate or
displace its benefits, without much success:
http://www.codingdojo.com/blog/9-most-in-demand-programming-languages-of-2016/

:-D

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Is the row version available in SQL?

2017-06-28 Thread Merlin Moncure

On Wed, Jun 28, 2017 at 9:54 AM, Rob Nikander  wrote:
> Hi,
>
> I'm reading about MVCC here:
> https://www.postgresql.org/docs/current/static/mvcc.html.
>
> In some cases I can use these transaction isolation modes, but in other
> cases, I may want to manage the versions and concurrency manually. Is there
> a way to select the “version” of a row, or is that data hidden to ordinary
> SQL statements?
>
> I’m thinking about a sequence of actions, which are logically a transaction,
> but start on one thread and complete later on another thread. I don’t think
> I can tie up a JDBC database connection for that long. I can add a `version
> bigint` column to my table, and the final update will look like:
>
>  update ... where id = ? and version =
> the_version_selected_earlier_in_a_different_thread
>
> … and if that gets 0 rows, it can handle the conflict.

What you are describing is called 'optimistic locking'.  If you google
that term you will see lots of tutorials on the approach.   Here's an
interesting write up on it by our good friends at 2ndquadrant:

https://blog.2ndquadrant.com/postgresql-anti-patterns-read-modify-write-cycles/

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] CREATE TABLE & composite type

2017-06-28 Thread Merlin Moncure

On Wed, Jun 28, 2017 at 8:37 AM, Adrian Klaver
 wrote:
> On 06/28/2017 06:27 AM, gmb wrote:
>>
>> Hi Referencing https://www.postgresql.org/docs/9.6/static/rowtypes.html
>> Taking a chance here Is there a short-hand way in which I can create a
>> table with the same structure as a user defined composite type ? E.g. CREATE
>> TYPE inventory_item AS ( name text, supplier_id integer, price numeric );
>> CREATE TABLE inventory_item_table ( like type inventory_item ); We're using
>> composite types rather extensively as the return structure of functions:
>> CREATE FUNCTION some_func() RETURNS SETOF inventory_item ; Of course I
>> can: CREATE TABLE inventory_item_table AS ( SELECT some_func( ) );
>
> CREATE TABLE inventory_item_table AS ( SELECT some_func( ) limit 0);

I think it's better to use the (somewhat arcane but designed for this
exact purpose) 'OF' syntax (hat tip to Peter E).  This is particularly
useful if you want to have multiple tables mirror the composite type
and manage the definition through the rowtype:

postgres=# create type foo as (a int, b int);
CREATE TYPE
postgres=# create table bar of foo;
CREATE TABLE
Time: 0.973 ms
postgres=# \d bar
  Table "public.bar"
 Column │  Type   │ Modifiers
┼─┼───
 a  │ integer │
 b  │ integer │
Typed table of type: foo

\h CREATE TABLE

CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [
IF NOT EXISTS ] table_name
OF type_name [ (


postgres=# alter type foo add attribute c text cascade;
ALTER TYPE

postgres=# \d bar
  Table "public.bar"
 Column │  Type   │ Modifiers
┼─┼───
 a  │ integer │
 b  │ integer │
 c  │ text│
Typed table of type: foo

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] performance considerations of jsonb vs separate rows

2017-06-19 Thread Merlin Moncure

On Mon, Jun 19, 2017 at 2:29 PM, Rob Nikander  wrote:
> I'm wondering about the tradeoffs, specifically: is it possible to update
> one piece of a jsonb value without having to rewrite the entire field? There
> are cases where that data field was getting pretty big (500kb). Would you
> expect any obvious performance differences between these two options?

You are basically asking, what are the relative
advantages/disadvantages of document model vs data store?  This is a
complicated discussion.  Here are some tradeoffs:

*) Document is always read/written in bulk.  Row data reads/writes are
more discrete (but generally postgres always reads/writes 8kb
minimum!)

*) for documents transaction tracking is for the entire document.
This is more efficient for storage but can have very serious
consequences if sub-portions of the document are updated under heavy
concurrency.

*) Documents are a pain if the structure changes in such a way so as
to require invalidation of all of them.

*) Documents can be a *real* pain if the data relationships change in
some fundamental way.   This is a pain with traditional tables as
well, but relational type models tend to be the most flexible vs other
approaches.

Basically there is a reason why SQL and relational type systems won
the 'data wars' of the 1970's and 1980's.   There are downsides to the
basic approach (especially performance due to joining) but the
simplicity and elegance of being able to model just about any problem
tends to compensate certain performance disadvantages.

Document style storage tends to move the database model out of the
database and into the application (which is a very poor tradeoff IMO)
and fall over when some of the initial assumptions with respect to the
document modeling discrete business units break down; you end up
storing the same information over and over in different documents
which causes all kinds of problems.  They do tend to work well in low-
or no- update applications however.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Merlin Moncure

On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes  wrote:
> If you have a RAID, set it to the number of spindles in your RAID and forget
> it. It is usually one of the less interesting knobs to play with.  (Unless
> your usage pattern of the database is unusual and exact fits the above
> pattern.)

Isn't that advice obsolete in a SSD world though?  I was able to show
values up to 256 for a single device provided measurable gains for a
single S3500.  It's true though that the class of queries that this
would help is pretty narrow.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Advisory lock deadlock issue

2017-06-07 Thread Merlin Moncure

On Wed, Jun 7, 2017 at 9:16 AM, David Rosenstrauch  wrote:
> I'm running a Spark job that is writing to a postgres db (v9.6), using the
> JDBC driver (v42.0.0), and running into a puzzling error:
>
> 2017-06-06 16:05:17.718 UTC [36661] dmx@dmx ERROR:  deadlock detected
> 2017-06-06 16:05:17.718 UTC [36661] dmx@dmx DETAIL:  Process 36661 waits for
> ExclusiveLock on advisory lock [16649,0,102440,5]; blocked by process 36662.
> Process 36662 waits for ExclusiveLock on advisory lock
> [16649,0,102439,5]; blocked by process 36661.
>
> However, I can't for the life of me figure out a) how the advisory locks are
> getting created (as I'm not calling for them myself), and b) how to fix this
> issue.
>
>
> A bit of background:  My Spark job runs as multiple processes on multiple
> machines.  Each process is performing the writes to pgsql using the jdbc
> driver.  The writes are performed a) as PG UPSERTS, b) as JDBC batches, and
> c) using JDBC prepared statements.  So each process, when it's time to write
> to the db, creates several prepared statements, adds a bunch of UPSERTs to
> each prepared statement (i.e., each prepared statement contains a batch of a
> few hundred UPSERTs), and then performs an executeBatch() on each statement
> to perform the write.  That executeBatch() call is where I'm running into
> the error.
>
> In theory, since there's multiple processes that are issuing these batched
> DB writes, there could be a record locking problem if, say, 2 processes
> tried to perform updates to the same user record.  But in reality this
> should be impossible.  Spark partitions everything based on a key - in my
> case userID - so all DB writes for the same user should be happening in the
> same process.  So at worst I could just have a batch that contains multiple
> UPSERTs to the same user record, but I should never be seeing updates to the
> same user from different processes.
>
>
> So, I'm very puzzled by that deadlock error.  Specifically:
>
> * How could it be possible that there are 2 PG processes trying to acquire
> the same lock?  Spark's partitioning should ensure that all updates to the
> same user record get routed to the same process, so this situation shouldn't
> even be possible.

That's really a question for the Spark team.  Obviously they are --
advisory locks lay on top of the basic locking mechanics and are very
well tested and proven.   What I can tell you is that in the core
functions provided by postgres there are no advisory locks thrown --
you own the locking space (that is, code under your control).

> * How/why am I winding up acquiring advisory locks in the first place? I'm
> never requesting them.  I looked at the PG JDBC driver code a bit, thinking
> that it might automatically be creating them for some reason, but that
> doesn't seem to be the case.  Maybe the PG database itself is? (E.g., Does
> the PG automatically use advisory locks with UPSERTs?)

Some code under your control is. This could be an external module,
application code, or an sproc.

> And, last but not least:
>
> * How do I resolve this "waits for ExclusiveLock on advisory lock" issue?
> There's precious little info available regarding exactly what that error
> message is and how to solve.

Barring some reponse from Spark team, here is how I would narrow the
problem down:

*) lets search the contents of pg_proc for functions calling advisory locks:
SELECT * FROM pg_proc where prosrc ~* 'advisory';

that might turn up some 3rd party code hits

*) turn on statement level logging and in bash:
tail -f postgres_xx.log | grep -i advisory

*) repeatedly query pg_locks for locktype = 'advisory'
SELECT * FROM pg_locks where locktype = 'advisory'

also,
SELECT * FROM pg_locks WHERE locktype = 'advisory' AND NOT granted;

Advisory locks are a very blunt instrument and it's a significant risk
that two different locking systems are stepping on each other's toes.
I do not recommend using them (especially non-xact variant) unless you
have total control over all the code potentially throwing locks and
have a good understanding of interactions with connection poolers
between locking code and the database.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] storing large files in database - performance

2017-05-19 Thread Merlin Moncure

On Fri, May 19, 2017 at 2:04 PM, Eric Hill  wrote:
> I am pleased to report that with Merlin's suggestion of using the 
> pg-large-object middleware, I have a test case now showing that I can write a 
> 25MB buffer from Node.js to Postgres in roughly 700 milliseconds.  Here is 
> the JavaScript code, which is nearly verbatim from the example in the 
> pg-large-object doc:
>
> packages.testLargeObjects = function(callback) {
>var pgp = require('pg-promise')();
>var LargeObjectManager = require('pg-large-object').LargeObjectManager;
>var PassThrough = require('stream').PassThrough;
>
>var bufSize = 1024 * 1024 * 25;
>var buf = new Buffer(bufSize);
>buf.fill("pgrocks");
>
>var connInfo = {
>   host:   'localhost',
>   port:   5432,
>   database:'mydb',
>   user:   'postgres,
>   password:'secret'
>};
>
>var db = pgp(connInfo);
>
>db.tx(function(tx) {
>   const lObjMgr = new LargeObjectManager({pgPromise: tx});
>   const bufferSize = 16384;
>
>   return lObjMgr.createAndWritableStreamAsync(bufferSize)
>  .then( ([oid, stream]) => {
> let bufferStream = new PassThrough();
> bufferStream.end(buf);
> bufferStream.pipe(stream);
> return new Promise(function(resolve, reject) {
>stream.on('finish', resolve);
>stream.on('error', reject);
> });
>  });
>})
>.then(function() {
>   callback();
>   pgp.end();
>})
>.catch(function(err) {
>   callback(err);
>   pgp.end();
>});
> };
>
> Thanks very much!

well done sir! that's probably as fast as you're going to get in node,
at least without a large investment at the driver level.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] union all taking years - PG 9.6

2017-05-18 Thread Merlin Moncure

On Mon, May 15, 2017 at 6:21 PM, Patrick B  wrote:
> I created a view selecting from both tables, with a UNION ALL between them.
> When selecting from that view, it's really slow. I can't even run explain
> analyze (it's been 1h and query did not finished yet).

Have you ruled out locks? Let the select run and check
pg_stat_activity and pg_locks.   If everything is good there, please
paste contents of explain (without analyze) and explain analyze of the
'good' queries.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] storing large files in database - performance

2017-05-18 Thread Merlin Moncure

On Thu, May 18, 2017 at 7:34 AM, Eric Hill  wrote:
> I would be thrilled to get 76 MB per second, and it is comforting to know 
> that we have that as a rough upper bound on performance.  I've got work to do 
> to figure out how to approach that upper bound from Node.js.
>
> In the meantime, I've been looking at performance on the read side.  For 
> that, I can bypass all my Node.js layers and just run a query from pgAdmin 4. 
>  I ran this query, where indexFile.contents for the row in question is 25MB 
> in size.  The query itself took 4 seconds in pgAdmin 4.  Better than the 12 
> seconds I'm getting in Node.js, but still on the order of 6MB per second, not 
> 76.  Do you suppose pgAdmin 4 and I are doing similarly inefficient things in 
> querying bytea values?

Probably.  I haven't spent a lot of time with pgadmin 4 so I'm not
entirely sure.  If you want a quick and dirty comparison, try using
running your query in psql unaligned mode for a comaprison point.  You
can also do \copy BINARY in the case of byte transfers.

The basic problem is not really the database, it's that database
interaction APIs tend not to be directed to this kind of problem.
The big picture issues are:

*) Driver overhead marshaling from wire format to managed types

*) Driver overhead for memory management

*) Wire format issues.  Certain types are *much* faster with the
binary wire format and are additionally much more memory efficient.
Your bytea transfers are probably being serialized to text and back in
both directions which is very wasteful, especially for very large
transfers since it's wasteful in terms of memory.

If I were to seriously look at node.js performance, my rough thinking
is that I'd want to be setting up the javascript variables directly in
C somehow using plv8 internal routines.  Short of that, I would
probably be querying all data out of postgres in json rather than
serializing individual fields (which is what I generally do in
practice).

Another point, some googling turned up
https://www.npmjs.com/package/pg-large-object which is definitely
something to consider trying.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] storing large files in database - performance

2017-05-17 Thread Merlin Moncure

On Tue, May 16, 2017 at 9:51 AM, Thomas Kellerer  wrote:
> John R Pierce schrieb am 16.05.2017 um 16:44:
>> On 5/16/2017 7:35 AM, Thomas Kellerer wrote:
>>> When my (JDBC based) SQL client and the database server are on the same 
>>> computer...
>>
>> node.js is Javascript, not java w/ jdbc
>
> I know that.
>
> I mentioned JDBC so that it's clear that the timings were done using a 
> different technology
>
> Maybe it's Node.js or the JavaScript "driver" that causes the problems.

When writing large objects to the database, method of transmission
will very much determine performance until you start hitting the
natural boundaries imposed by the database.

via (hastily written):
#include "libpqtypes.h"
#include "stdlib.h"
#include "string.h"

int main()
{
  int s = 1024 * 1024 * 256;

  char *p = malloc(s);
  memset(p, 'x', s);
  p[s-1] = 0;

  PGconn *conn = PQconnectdb("");
  PQinitTypes(conn);

  PGresult *res = PQexecf(conn, "insert into foo values(1,%text)", p);

  if(!res)
fprintf(stderr, "*ERROR: %s\n", PQgeterror());

  PQclear(res);
}

mmoncure@mernix2 09:13 AM /tmp$ gcc -otest test.c -lpq -lpqtypes -I
/home/mmoncure/src/libpqtypes-1.5.1/src -I
/home/mmoncure/pg94/include/ -L
/home/mmoncure/src/libpqtypes-1.5.1/.libs/
mmoncure@mernix2 09:13 AM /tmp$ psql -c "create table foo(i int, f text)"
CREATE TABLE
mmoncure@mernix2 09:13 AM /tmp$ psql -c "alter table foo alter f set
storage external"
ALTER TABLE
mmoncure@mernix2 09:14 AM /tmp$ time
LD_LIBRARY_PATH=/home/mmoncure/src/libpqtypes-1.5.1/.libs ./test

real 0m3.245s
user 0m0.092s
sys 0m0.232s
mmoncure@mernix2 09:15 AM /tmp$ psql -c "select
pg_size_pretty(pg_table_size('foo'))"
 pg_size_pretty

 266 MB
(1 row)


...that's over 76mb/sec (to local server) for 256mb transfer.  That's
pretty good I think.   We have a 1GB barrier on bytea/text and (at
least in C, with certain reasonable precautions) you can work
comfortably under that limit.  There might be other better strategies
but it can be done.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Insert performance and disk usage in JSON vs JSONB

2017-05-15 Thread Merlin Moncure

On Mon, May 15, 2017 at 12:02 PM, Ronny Abraham  wrote:
> 4. Insert 10,000 rows to JSON, execution time (sec):
> 5. Insert 10,000 rows to JSONB, execution time (sec):
>
> What’s interesting is that inserting to JSONB is slightly faster than
> inserting to JSON.

With those times, only explanation is that you are bottlenecked by
storage fsync time.  If storage is fast, you should be able to do 10k
inserts per second+ for either type unless the documents are large.

For storing documents, you should only consider using the json type if
the exact structure of the originating document is important for the
most part.  For querying and manipulation it's better and faster
(mostly) and the API is more robust.

The json type has other uses; mainly for serialization from non-json
data but that's not a storage consideration.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] dynamic schema modeling and performance

2017-04-12 Thread Merlin Moncure

On Tue, Apr 11, 2017 at 12:46 PM, Rj Ewing  wrote:
> I'm looking for thoughts on the best way to handle dynamic schemas.
>
> The application I am developing revolves around user defined entities. Each
> entity is a tabular dataset with user defined columns and data types.
> Entities can also be related to each other through Parent-Child
> relationships. Some entities will be 100% user driven, while others (such as
> an entity representing a photo) will be partially user driven (all photo
> entities will have common fields + custom user additions).
>
> I was hoping to get opinions on whether postgresql would be a suitable
> backend. A couple of options I have thought of are:
>
> 1. Each entity is represented as a table in psql. The schema would be
> dynamically updated (with limits) when an entity mapping is updated. I
> believe that this would provide the best data constraints and allow the best
> data normalization. A concern I have is that there could be an enormous
> amount of tables generated and the performance impacts this might have in
> the future. I could then run elasticsearch as a denormalized cache for
> efficient querying and full-text-search.
>
> 2. Use a nosql database. This provides the "dynamic" schema aspect. A
> concern here is the lack of relation support, thus leading to a more
> denormalized data structure and the potential for the data to become
> corrupted.
>
> Any opinions on the use of psql for this case, or other options would be
> greatly appreciated!

Postgres can function as a nosql database -- you can use jsonb for
example to archive data in such a way that the data model can be
changed without making schema adjustments.   Another way to do it is
EAV pattern as noted.  These might be good strategies if you're not
sure where you want to end up.

It really comes down to this: how formal do you want your data model
to be?   Adding formality leads to performance optimizations, exposes
your data to the fantastic SQL language, and allows rigorous
assumptions to made made from external dependencies and trusted.
Formality also brings a degree of inflexibility since your data has to
be forced into predetermined structures.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are multiple array_aggs going to be in the same order?

2017-04-10 Thread Merlin Moncure

On Mon, Apr 10, 2017 at 1:43 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Mon, Apr 10, 2017 at 1:35 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>
>> Yes, and in fact we documented the ORDER-BY-in-subselect solution back
>> before we had the ORDER-BY-in-aggregate feature.  I don't remember exactly
>> where, but I'm sure it's still described somewhere.  So it is documented
>> behavior that an aggregate without its own ORDER BY will see the rows in
>> whatever order the FROM clause supplies them.
>
> The documentation is a bit ambiguous on the topic TBH.Via
> https://www.postgresql.org/docs/9.6/static/functions-aggregate.html:
>
> "The aggregate functions array_agg, json_agg, jsonb_agg,
> json_object_agg, jsonb_object_agg, string_agg, and xmlagg, as well as
> similar user-defined aggregate functions, produce meaningfully
> different result values depending on the order of the input values.
> This ordering is unspecified by default, but can be controlled by
> writing an ORDER BY clause within the aggregate call, as shown in
> Section 4.2.7. Alternatively, supplying the input values from a sorted
> subquery will usually work.


What do you think about injecting the following lingo?

... depending on the order of the input values. This ordering is
unspecified by default, but can be controlled by ...

would be changed to

... depending on the order of the input values. Input value ordering
will be consistent across multiple order dependent aggregate functions
across the same grouping but is otherwise unspecified by default as
long as all there is no explicit ordering for any aggregate function
in the grouping.  Furthermore, input value ordering can be controlled
by ...

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are multiple array_aggs going to be in the same order?

2017-04-10 Thread Merlin Moncure

On Mon, Apr 10, 2017 at 1:35 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> "David G. Johnston" <david.g.johns...@gmail.com> writes:
>> On Mon, Apr 10, 2017 at 11:02 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>>> Sure, but isn't it fair to consider that an implementation artifact?
>
>> So, the presence of ORDER BY in the aggregate function call is a PostgreSQL
>> extension...
>
>> It seems reasonable to declare that the order of the values in the
>> generated array match whatever order the FROM clause supplies the rows.  If
>> that is not acceptable a PostgreSQL-specific ORDER BY modifier can be added
>> which will cause an additional sort-and-scan of the input relation to occur
>> (optimized across multiple column invocations when possible).
>
> Yes, and in fact we documented the ORDER-BY-in-subselect solution back
> before we had the ORDER-BY-in-aggregate feature.  I don't remember exactly
> where, but I'm sure it's still described somewhere.  So it is documented
> behavior that an aggregate without its own ORDER BY will see the rows in
> whatever order the FROM clause supplies them.

The documentation is a bit ambiguous on the topic TBH.Via
https://www.postgresql.org/docs/9.6/static/functions-aggregate.html:

"The aggregate functions array_agg, json_agg, jsonb_agg,
json_object_agg, jsonb_object_agg, string_agg, and xmlagg, as well as
similar user-defined aggregate functions, produce meaningfully
different result values depending on the order of the input values.
This ordering is unspecified by default, but can be controlled by
writing an ORDER BY clause within the aggregate call, as shown in
Section 4.2.7. Alternatively, supplying the input values from a sorted
subquery will usually work. For example:

SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

Beware that this approach can fail if the outer query level contains
additional processing, such as a join, because that might cause the
subquery's output to be reordered before the aggregate is computed."

If you think the behavior ought to be defined to work that way, that's
fine by me.   A small documentation fix could clarify that, I think.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are multiple array_aggs going to be in the same order?

2017-04-10 Thread Merlin Moncure

On Mon, Apr 10, 2017 at 12:01 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmonc...@gmail.com> writes:
>> On Sun, Apr 9, 2017 at 4:27 PM, Guyren Howe <guy...@gmail.com> wrote:
>>> If I do a SELECT with a GROUP_BY and multiple ARRAY_AGGs, will the 
>>> ARRAY_AGGs be guaranteed to have entries in the same (ie corresponding) 
>>> order?
>>>
>>> eg
>>>
>>> SELECT
>>> u.name,
>>> ARRAY_AGG(o.order_date) AS order_dates,
>>> ARRAY_AGG(o.order_total) AS order_totals
>>> FROM
>>> user u JOIN
>>> orders o USING (user_id)
>>> GROUP BY
>>> u.user_id
>
>> It is unsafe to rely on aggregation order unless specified -- you can
>> add ORDER BY to the aggregation clause.
>
> You definitely can't assume anything about the order in which the FROM
> clause will deliver rows, but I think that's not quite what the question
> was.  If I read it right, the OP wants to be sure that the two aggregate
> functions will see the data in the *same* unspecified order.  I think
> that's a pretty safe assumption.  The server would have to go way
> out of its way to do differently, and it doesn't.

Sure, but isn't it fair to consider that an implementation artifact?
If his code depends on that ordering being the same across aggregate
functions, and the SQL standard doesn't specify that (I guess it
might, but I'm skeptical), he ought to specify that for clarify at the
very least.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are multiple array_aggs going to be in the same order?

2017-04-10 Thread Merlin Moncure

On Sun, Apr 9, 2017 at 4:27 PM, Guyren Howe  wrote:
>
> If I do a SELECT with a GROUP_BY and multiple ARRAY_AGGs, will the ARRAY_AGGs 
> be guaranteed to have entries in the same (ie corresponding) order?
>
> eg
>
> SELECT
> u.name,
> ARRAY_AGG(o.order_date) AS order_dates,
> ARRAY_AGG(o.order_total) AS order_totals
> FROM
> user u JOIN
> orders o USING (user_id)
> GROUP BY
> u.user_id


It is unsafe to rely on aggregation order unless specified -- you can
add ORDER BY to the aggregation clause.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-04 Thread Merlin Moncure

On Tue, Apr 4, 2017 at 8:20 AM, Daniel Westermann
 wrote:
> Hi,
>
> PostgreSQL 9.6.2 on CentOS 7.3 x64.
>
> This is my data set:
>
> drop table if exists users;
> drop table if exists ids;
> create table users ( user_id int
>, username varchar(50)
>);
> with generator as
> ( select a.*
> from generate_series (1,300) a
>order by random()
> )
> insert into users ( user_id
>   , username
>   )
>select a
> , md5(a::varchar)
>  from generator;
> create unique index i_users on users ( user_id );
> create table ids ( id int );
> insert into ids (id) values ( generate_series ( 250, 350 ) );
> create unique index i_ids on ids ( id );
> analyze users;
> analyze ids;
>
> I have set work_mem to a very low value intentionally for demonstration
> purposes:
>
> postgres=# show work_mem;
>  work_mem
> --
>  16MB
> (1 row)
>
> postgres=# show shared_buffers ;
>  shared_buffers
> 
>  128MB
> (1 row)
>
>
> When I run the following query ( I know that "not in" is not a good choice
> here ):
>
> postgres=# select count(user_id) from users where user_id not in ( select id
> from ids);

"NOT IN" where the predate is a table column can lead to very poor
query plans especially where the haystack is not provably known (at
plan time) to contain only not null values.  By reducing work_mem, the
server has decided has to repeatedly search the table to search for
the presence of null values.  Try converting the query to NOT EXISTS.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] My humble tribute to psql -- usql v0.5.0

2017-04-03 Thread Merlin Moncure

On Mon, Apr 3, 2017 at 4:42 PM, Kenneth Shaw  wrote:
> UTF-8 works just fine with MS SQL:
>
> ms:booktest@192.168.1.5=> select N'这是一个';
> col0
> +--+
>   这是一个
> (1 rows)

confirmed!

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] My humble tribute to psql -- usql v0.5.0

2017-04-03 Thread Merlin Moncure

On Sun, Apr 2, 2017 at 7:55 AM, Kenneth Shaw  wrote:
> Hi All,
>
> I apologize in advance if this is somewhat off-topic, but I thought I
> would inform the people (ie, psql users) about usql, a
> universal-command line tool that aims to work the same way psql does,
> but with every database (not just PostgreSQL).
>
> usql is built in Go, and as of today supports all the major databases
> (PostgreSQL, MySQL, SQLite3, Microsoft SQL Server, Oracle Database)
> and more! Additionally, with v0.5.0, released today, usql now has
> implemented most of the basic, and much of the advanced features of
> psql:
>
> * Interpolated variables (ie, via \set, \unset, etc.)
> * Backslash commands, and similar shell escapes/evaluation (ie, \echo
> `date` style)
> * Working with a query buffer (ie, \e, \p, \r, etc.)
> * Password (.usqlpass) / RC (.usqlrc) files
>
> If you happen to have a Go tool chain installed, you can simply install with:
>
> go get -u github.com/knq/usql
>
> Alternatively, you may download a binary release for Windows/Linux/OSX here:
>
> https://github.com/knq/usql/releases
>
> I just released usql v0.5.0 and feel that it is now ready for a wider
> audience, and I thought what better audience than those already
> familiar with the type/style of cli interface psql offers. I'm hoping
> that there are those of you on this list that might like to have a
> tool in their toolbox that works with every other database in a
> similar way to how psql does.
>
> Over the last 15 years that I've been a heavy PostgreSQL user, and I
> have likely used the psql command line for -- literally -- thousands
> of hours! Over that same period of time, I have continually found
> myself needing to reread documentation for all the various cli clients
> out there. usql is the tool I wish I had 10+ years ago.
>
> I built usql primarily due to my frustration with having to work with
> 5 different, broken CLI interfaces for all the different/major
> databases on a relatively routine basis. I am posting this here,
> because I imagine many of the other users on this list on occasion
> have had, on occasion, needed to work with other databases, but are
> stymied/annoyed/stopped/frustrated-to-no-ends with having to use
> multiple, incompatible clients.
>
> At least now, while you might be forced to use some other, inferior
> database, at least you can still hit the ground running. I appreciate
> any feedback/questions/etc you might have!
>
> And thank you all to the PostgreSQL + psql developers out there! Don't
> forget that imitation is the most sincerest form of flattery ...
> cheers!
>
> -Ken
>
> (ps: usql is brand new; it's great for doing many common tasks, minor
> to medium sized database administration tasks and development at the
> moment; in time, it'll be ready for anything, but CAVEAT USER until
> then ...)

Wow! this is _fantastic_.  I use "sqsh" for connecting to sql server
-- this is already a significant improvement in many ways (based on
playing around for around 10 minutes).  This app would fill a real
need for me.  Here is some feedback:

*) Does UTF8 support work?  It appears not to for me:
=> select '这是一个';
  col0
+--+
  
(1 rows)

*) Is there any way to disable SSL to postgres in the DSN?  I get,
error: pq: SSL is not enabled on the server

*) Interacting with PAGER is a really essential feature

*) Ditto \copy (to my limited understanding this would be a fairly big
development challenge)

*) Advise reserving -h command line switch for 'host', not --help

*) \g switch is not working FWICT (it outputs to stdout) -- this makes
it hard to benchmark performance since output to display is the main
bottlenect.  First impressions though is that the tool is pretty fast.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Advice about software engineering inside Postgres?

2017-03-20 Thread Merlin Moncure

On Mon, Mar 20, 2017 at 2:32 PM, Guyren Howe  wrote:

> I’m working on a project to get the likes of web developers to make more
> effective use of Postgres. This amounts to saying that much of the M in MVC
> should be implemented as code and relations in Postgres.
>

This is fairly insightful, I think.  It's a noble task.

> The more I think about this, the more I see that a model written in Node,
> say, that just does queries against the database would be written quite
> differently as PLV8 stored procedures, triggers and functions inside
> Postgres. It would have a more purely functional style, you would implement
> many things as functions that could be used in views and queries, you can
> write functions that are callable between programming languages, you can
> write functions that act as result set generators. And I’m sure there are
> other things I’ve not thought about. Oh: testing will likely be rather
> different.
>
> Are there any good resources about using Postgres as a programming
> platform from within, as it were?
>

As noted above, it's a wonderful largely unexplored area of research.
Postgres is maybe the most productive application development platform out
there and can be used for all sorts of wonderful things.  Maybe the most
challenging part is dealing with the endless pool of anti-database zealots
who seem to know everything there is about software development except SQL.

Unfortunately, resources about going 'deep postgres' are scant.  You're
going to have to break some ground.  Go find a problem and solve it, and be
very verbose when doing so.  A front end abstraction in the database itself
is one interesting project that could be reasonably done. There have been a
few half hearted attempts at it, but it's something that is worth shooting
for.

merlin

>

Re: [GENERAL] JSONB Overlap Operator.

2017-03-20 Thread Merlin Moncure

On Fri, Mar 17, 2017 at 9:43 AM, Eduardo Felipe  wrote:
> Hi there!
>
> In a project I was experimenting with replacing array columns with JSONB
> columns, to allow a greater flexibility of types.
>
> One thing that I found missing is the "overlap" operator (&&).
>
> JSONB includes all other operators from arrays, such as containment (@>,
> <@), comparison (>, >=, <, <=) and inclusion (?). The only one missing is
> the overlap operator (&&).
>
> Does anyone know of a technical reason that operator was not included on
> JSONB?

I really miss hstore slice() (which is a function, not an operator,
but the distinction is is minor IMO) where you can pass array of keys
and get a smaller object back.  This would have to be redefined a bit
since there are new edge cases with jsonb vs hstore but it ought to be
reasonably worked out.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-23 Thread Merlin Moncure

On Wed, Feb 22, 2017 at 3:19 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Thursday, February 16, 2017, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>>
>>> Tim Bellis <tim.bel...@metaswitch.com> writes:
>>> > Even though this is a read only query, is it also expected to be
>>> > blocked behind the vacuum? Is there a way of getting indexes for a table
>>> > which won't be blocked behind a vacuum?
>>>
>>> It's not the vacuum that's blocking your read-only queries.  It's the
>>> ALTER TABLE, which needs an exclusive lock in order to alter the table's
>>> schema.  The ALTER is queued waiting for the vacuum to finish, and lesser
>>> lock requests queue up behind it.  We could let the non-exclusive lock
>>> requests go ahead of the ALTER, but that would create a severe risk of
>>> the
>>> ALTER *never* getting to run.
>>>
>>> I'd kill the ALTER and figure on trying again after the vacuum is done.
>>
>> I've been drilled by this and similar lock stacking issues enough times to
>> make me near 100% sure deferring the ALTER would be the better choice
>
> This seems like a rather one-sided observation.  How could you know how
> often the unimplemented behavior also would have "drilled" you, since it is
> unimplemented?

Well, that I can't really say, but at least in my case ALTER TABLE in
the face of concurrent application activity can wait but locking
tables for reading for an indefinite period will rapidly destabilize
the system.  An example of this usage is replacing partitions on a
warehouse table.  About half of my P1s over the last 12 months ago are
relating to locking problem of some kind.

So I end up during workarounds such as issuing "LOCK...NOWAIT" in a
sleep loop :( or application restructuring, especially minimizing use
of TRUNCATE.

I do think instrumentation around locking behaviors would be helpful.
Allowing (optionally) waiters to leapfrog in if they can clear would
be wonderful as would being able to specify maximum wait timeouts
inside a transaction.  FWIW, I'm not sure this behavior makes sense
attached to LOCK, I'd rather see them attached generally to SET
TRANSACTION -- my 0.02$ (talk is cheap, etc).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgresql how to duplicate rows in result.

2017-02-21 Thread Merlin Moncure

On Thu, Feb 16, 2017 at 10:27 AM, David G. Johnston
 wrote:
> If it wasn't lateral the reference to number in "generate_series(1, number)"
> would fail.

huh -- I didn't know that!  Testing it out, all JOIN types imply
LATERAL if the function call is tlist SRF style (which is pretty werid
IMO)  I tend to avoid optional words (with the important exception of
AS for column list renames) but I think it's a good idea to disclose
LATERAL in this case.  It's a big clue to the reader what is going on
and the expanded form:

SELECT foo.* FROM foo CROSS JOIN LATERAL (SELECT a,b FROM func(foo.bar)) q;

... requires LATERAL to be explicitly stated.  This form is more
general since it can be cleanly used when func() returns more than one
column.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-20 Thread Merlin Moncure

On Thursday, February 16, 2017, Tom Lane  wrote:

> Tim Bellis > writes:
> > Even though this is a read only query, is it also expected to be blocked
> behind the vacuum? Is there a way of getting indexes for a table which
> won't be blocked behind a vacuum?
>
> It's not the vacuum that's blocking your read-only queries.  It's the
> ALTER TABLE, which needs an exclusive lock in order to alter the table's
> schema.  The ALTER is queued waiting for the vacuum to finish, and lesser
> lock requests queue up behind it.  We could let the non-exclusive lock
> requests go ahead of the ALTER, but that would create a severe risk of the
> ALTER *never* getting to run.
>
> I'd kill the ALTER and figure on trying again after the vacuum is done.
>
>
I've been drilled by this and similar lock stacking issues enough times to
make me near 100% sure deferring the ALTER would be the better choice


merlin

Re: [GENERAL] PostgreSQL corruption

2017-02-16 Thread Merlin Moncure

On Tue, Feb 14, 2017 at 7:23 PM, James Sewell 
wrote:

> OK,
>
> So with some help from the IRC channel (thanks macdice and JanniCash)
>  it's come to light that my RAID1 comprised of 2 * 7200RPM disks is
> reporting ~500 ops/sec in pg_test_fsync.
>
> This is higher than the ~120 ops/sec which you would expect from 720RPM
> disks - therefore something is lying.
>
> Breaking up the RAID and re-imaging with JBOD dropped this to 50 ops/sec -
> another question but still looking like a real result.
>
> So in this case it looks like the RAID controller wasn't disabling caching
> as advertised.
>

yup -- that's the thing.  Performance numbers really tell the whole (or at
least most-) of the story.  If it's too good to be true, it is.  These
days, honestly I'd just throw out the raid controller and install some
intel ssd drives.

merlin

Re: [GENERAL] postgresql how to duplicate rows in result.

2017-02-14 Thread Merlin Moncure

On Tue, Feb 14, 2017 at 12:36 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Tue, Feb 14, 2017 at 12:04 PM, Alessandro Baggi
> <alessandro.ba...@gmail.com> wrote:
>> Hi list,
>> sorry for my english, I will try to example as well. I've a query that joins
>> multiple tables and return a result like:
>>
>> id,customers,phone,code,number
>> 1 , ,3,123 , 2
>> 2 , aassdsds,33322,211 , 1
>> 3 , ,21221,221 , 1
>>
>>
>> I need, where "number" field is > 1, to duplicate the row * N(number field
>> value) with a result like this:
>>
>> id,customers,phone,code,number
>> 1 , ,3,123 , 2
>> 1 , ,3,123 , 2
>> 2 , aassdsds,33322,211 , 1
>> 3 , ,21221,221 , 1
>>
>> How I can accomplish to this problem?
>
> SELECT * FROM foo CROSS JOIN LATERAL (1,number);
>
> :-D

oops -- copy/paste error

SELECT * FROM foo CROSS JOIN LATERAL generate_series(1,number);

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgresql how to duplicate rows in result.

2017-02-14 Thread Merlin Moncure

On Tue, Feb 14, 2017 at 12:42 PM, John McKown
 wrote:
> On Tue, Feb 14, 2017 at 12:04 PM, Alessandro Baggi
>  wrote:
>>
>> Hi list,
>> sorry for my english, I will try to example as well. I've a query that
>> joins multiple tables and return a result like:
>>
>>
>> id,customers,phone,code,number
>> 1 , ,3,123 , 2
>> 2 , aassdsds,33322,211 , 1
>> 3 , ,21221,221 , 1
>>
>>
>> I need, where "number" field is > 1, to duplicate the row * N(number field
>> value) with a result like this:
>>
>> id,customers,phone,code,number
>> 1 , ,3,123 , 2
>> 1 , ,3,123 , 2
>> 2 , aassdsds,33322,211 , 1
>> 3 , ,21221,221 , 1
>>
>> How I can accomplish to this problem?
>>
>> I'm using postgresql 9.3.15
>
>
> I don't have PostgreSQL 9.3.15. I am running 9.5.5. But I think the
> following recursive CTE should work for you.
>
> 
>
> tsh009=# \d baggi
>   Table "public.baggi"
>   Column   |  Type   | Modifiers
> ---+-+---
>  id| integer |
>  customers | text|
>  phone | text|
>  code  | integer |
>  number| integer |
>
> tsh009=# select * from baggi;
>  id | customers | phone | code | number
> +---+---+--+
>   1 |   | 3 |  123 |  2
>   2 | aassdsds  | 33322 |  211 |  1
>   3 |   | 21221 |  221 |  1
> (3 rows)
>
> tsh009=# with recursive multiple(k, id, customers, phone, code, number) as (
> select 1, id, customers, phone, code, number from baggi
> UNION ALL
> select m.k + 1, b.id, b.customers, b.phone, b.code, b.number from multiple
> AS m, baggi AS b  where m.id = b.id and m.k < b.number )
> select id, customers, phone, code, number from multiple order by id
> ;
>  id | customers | phone | code | number
> +---+---+--+
>   1 |   | 3 |  123 |  2
>   1 |   | 3 |  123 |  2
>   2 | aassdsds  | 33322 |  211 |  1
>   3 |   | 21221 |  221 |  1
> (4 rows)

IMO, lateral join (available as of 9.3) is faster and simpler.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgresql how to duplicate rows in result.

2017-02-14 Thread Merlin Moncure

On Tue, Feb 14, 2017 at 12:04 PM, Alessandro Baggi
 wrote:
> Hi list,
> sorry for my english, I will try to example as well. I've a query that joins
> multiple tables and return a result like:
>
> id,customers,phone,code,number
> 1 , ,3,123 , 2
> 2 , aassdsds,33322,211 , 1
> 3 , ,21221,221 , 1
>
>
> I need, where "number" field is > 1, to duplicate the row * N(number field
> value) with a result like this:
>
> id,customers,phone,code,number
> 1 , ,3,123 , 2
> 1 , ,3,123 , 2
> 2 , aassdsds,33322,211 , 1
> 3 , ,21221,221 , 1
>
> How I can accomplish to this problem?

SELECT * FROM foo CROSS JOIN LATERAL (1,number);

:-D

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres

2017-02-13 Thread Merlin Moncure

On Sat, Feb 11, 2017 at 12:48 AM, prakash ramakrishnan
 wrote:
> Hi,
>
>Am Prakash from Chennai and am working in postgres edb 9.5 I need
> your help for pgpool and pgbouncer configuration steps and please keep in
> touch if I get any error.

why don't you ask some specific questions?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Can we not give tyrannical pedants control of #postgresql?

2017-02-01 Thread Merlin Moncure

On Thu, Jan 19, 2017 at 5:23 PM, Julian Paul  wrote:
> I hope that particular stereotypes aren't proven here, but it appears
> #postgresql encourages a particular tier and makes aware of it's rigid
> hierarchy. I owe alot to #postgresql but not to these particular users, I've
> perhaps been idle for too long and the channel has change for the worse,
> well that's not my fault. I leave it with the community to sort out.

I haven't been on irc much lately, but I've noticed this trend as well.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries are taking way longer in 9.6 than 9.5

2017-01-19 Thread Merlin Moncure

On Thu, Jan 19, 2017 at 1:28 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> On 01/19/2017 06:21 AM, Merlin Moncure wrote:
>>
> ...
>>
>>
>> yes.  Either way, I would like to very much understand how server is
>> preferring 3m cost plan to 2.6k cost plan or is otherwise unable to
>> access the cheap plan.
>>
>
> Perhaps it doesn't even see the 2.6k cost - it may not generate the plan for
> some reason, or it arrives to different estimates.
>
> The OP was repeatedly asked for details of the configuration, I've even sent
> a query for doing that a few days back. It's difficult to help when such
> basic info is not provided, though.
>
> It'd also be interesting to see the statistics for the tables involved, so
> something like
>
>SELECT * FROM pg_stats WHERE tablename IN (... tables ...)
>
> and
>
>SELECT relname, relpages, reltuples, relallvisible
>  FROM pg_class WHERE relame IN (... tables ...)
>
> might shed some light on what the planner assumes about the data.
>
> Of course, there are other things we might need to know. For example if
> there are foreign keys between the tables, 9.6 is using that for the
> estimates (unlike 9.5). And so on.

Maybe. This smells like bug or missing index.  Key for me is:
  ->  Seq Scan on tblpuorderstatus os  (cost=0.00..96501.53
rows=11185842 width=8) (actual time=0.011..822.937 rows=11182962

I can't see why any reasonable plan path would choose this unless the
index on "vip_order_id" is missing  (my money is on this actually)  or
there is a gross preference to prefer parallel scans over traditional
plan choices irrespective of cost.

I'm rolling out 9.6 next week and so have a particular interest in
this.  :-)  I tend also utilize a lot of cascading semi-joins and so
am curious to see where this turns up.

Ravi, if you can arrange for screen share or some similar way of
looking at this system I'd be happy to help you trouble shoot, gratis.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries are taking way longer in 9.6 than 9.5

2017-01-18 Thread Merlin Moncure

On Wed, Jan 18, 2017 at 11:10 PM, Adrian Klaver
<adrian.kla...@aklaver.com> wrote:
> On 01/18/2017 08:58 PM, Merlin Moncure wrote:
>>
>> On Wed, Jan 18, 2017 at 2:12 PM, Melvin Davidson <melvin6...@gmail.com
>> <mailto:melvin6...@gmail.com>> wrote:
>>
>>
>>
>> On Wed, Jan 18, 2017 at 3:06 PM, Merlin Moncure <mmonc...@gmail.com
>> <mailto:mmonc...@gmail.com>> wrote:
>>
>> On Wed, Jan 18, 2017 at 1:04 PM, Ravi Tammineni
>> <rtammin...@partner.aligntech.com
>> <mailto:rtammin...@partner.aligntech.com>> wrote:
>> > Hi Chris,
>> >
>> > Here is the query and execution plan in 9.5 and 9.6.
>>
>> Can you verify tblpuorderstatus and tblpuorderstatushistory have
>> all
>> indexes accounted for on both servers?  It seems incredible server
>> would prefer wading through 11M records to 1298 nestloop.  I'm
>> curious
>> what plans you get if you try playing around with:
>>
>> set enable_seqscan=false;
>> set enable_hashjoin=false;
>>
>> ...but I think we have two possibilities here:
>> 1. schema mismatch
>> 2. planner bug
>>
>> merlin
>>
>>
>> --
>> Sent via pgsql-general mailing list
>> (pgsql-general@postgresql.org
>> <mailto:pgsql-general@postgresql.org>)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>> <http://www.postgresql.org/mailpref/pgsql-general>
>>
>>
>> *I never got an answer to my question.
>> *
>> *Have you verified that postgresql.conf is the same of both 9.5 &
>> 9.6?*
>>
>>
>> This is not verified, but I can't think of an influential planner
>> variable that would push planner cost from 2600 to millions; abrupt
>> increase in plan cost roles out a knife edge plan choice and the
>> statistic look relatively correct on rows.  Unless planner choices are
>> disabled in postgresql.conf, this suggests something is preventing
>> planner from choosing a particular kind of plan for this query, which is
>> suggesting bug to me.
>
>
> I am still working out the parallel query feature in 9.6 but I am seeing the
> below in the 9.6 EXPLAIN ANALYZE:
>
>  ->  Gather  (cost=1000.00..3011004.71 rows=529690 width=4) (actual
> time=2.713..368445.460 rows=595653 loops=1)
>  Workers Planned: 2
>  Workers Launched: 2
>
> Does that not indicate parallel query has been turned on?
>
> Would not turning it off be a better apple-to-apple comparison to the 9.5
> plan?

yes.  Either way, I would like to very much understand how server is
preferring 3m cost plan to 2.6k cost plan or is otherwise unable to
access the cheap plan.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries are taking way longer in 9.6 than 9.5

2017-01-18 Thread Merlin Moncure

On Wed, Jan 18, 2017 at 2:12 PM, Melvin Davidson <melvin6...@gmail.com>
wrote:

>
>
> On Wed, Jan 18, 2017 at 3:06 PM, Merlin Moncure <mmonc...@gmail.com>
> wrote:
>
>> On Wed, Jan 18, 2017 at 1:04 PM, Ravi Tammineni
>> <rtammin...@partner.aligntech.com> wrote:
>> > Hi Chris,
>> >
>> > Here is the query and execution plan in 9.5 and 9.6.
>>
>> Can you verify tblpuorderstatus and tblpuorderstatushistory have all
>> indexes accounted for on both servers?  It seems incredible server
>> would prefer wading through 11M records to 1298 nestloop.  I'm curious
>> what plans you get if you try playing around with:
>>
>> set enable_seqscan=false;
>> set enable_hashjoin=false;
>>
>> ...but I think we have two possibilities here:
>> 1. schema mismatch
>> 2. planner bug
>>
>> merlin
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
> *I never got an answer to my question.*
> *Have you verified that postgresql.conf is the same of both 9.5 & 9.6?*
>

This is not verified, but I can't think of an influential planner variable
that would push planner cost from 2600 to millions; abrupt increase in plan
cost roles out a knife edge plan choice and the statistic look relatively
correct on rows.  Unless planner choices are disabled in postgresql.conf,
this suggests something is preventing planner from choosing a particular
kind of plan for this query, which is suggesting bug to me.

OP, if you want to contribute to the investigation of fix, "git bisect" is
the way to proceed...is that feasible?

merlin

Re: [GENERAL] Queries are taking way longer in 9.6 than 9.5

2017-01-18 Thread Merlin Moncure

On Wed, Jan 18, 2017 at 1:04 PM, Ravi Tammineni
 wrote:
> Hi Chris,
>
> Here is the query and execution plan in 9.5 and 9.6.

Can you verify tblpuorderstatus and tblpuorderstatushistory have all
indexes accounted for on both servers?  It seems incredible server
would prefer wading through 11M records to 1298 nestloop.  I'm curious
what plans you get if you try playing around with:

set enable_seqscan=false;
set enable_hashjoin=false;

...but I think we have two possibilities here:
1. schema mismatch
2. planner bug

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] efficiently migrating 'old' data from one table to another

2017-01-13 Thread Merlin Moncure

On Fri, Jan 13, 2017 at 12:03 PM,  <kbran...@pwhome.com> wrote:
> On Jan 12, 2017, Jonathan Vanasco <postg...@2xlp.com> wrote:
>>On Jan 12, 2017, at 5:52 PM, Merlin Moncure wrote:
>>
>>> On Thu, Jan 12, 2017 at 2:19 PM, bto...@computer.org
>>> <bto...@broadstripe.net> wrote:
>>>>
>>>> Review manual section 7.8.2. Data-Modifying Statements in WITH
>>>>
>>>>
>>>> https://www.postgresql.org/docs/9.6/static/queries-with.html
>>>
>>> this.
>>>
>>> with data as (delete from foo where ... returning * ) insert into
>>> foo_backup select * from data;
>>
>>Thanks, btober and merlin.  that's exactly what i want.
>
> To help you a little more, I just did this for a set of tables within the
> last week. :) The heart of the program is this sql:
>
> my $Chunk_size = 1;
> my $Interval = 24;
> my $sql = "
> WITH
> keys AS (
> SELECT $pk_column
> FROM $table
> WHERE $time_column < NOW() - '$Interval MONTHS'::INTERVAL
> ORDER BY $pk_column
> LIMIT $Chunk_size ),
> data AS (
> DELETE FROM $table
> WHERE $pk_column <= (SELECT MAX($pk_column) FROM keys)
> RETURNING * )
> INSERT INTO archive_$table SELECT * FROM data;";
>
> That's from Perl, but I suspect you can guess as to what each var should be 
> for
> your application. You can set $Chunk_size to whatever you want. There is
> obviously a loop around that which executes until we get 0 rows, then we move
> on to the next table.
>
> The point of the chunks was to limit the impact on the production tables
> as we move data out of them. If you don't have that concern and want to do all
> rows at once then remove the LIMIT and ORDER BY.

FYI, although it's likely ok in this instance, directly inserting
table names without precaution is considered dubious and should be
avoided as practice.  SQL injection is a risk, and your code will fail
in the presence of unusual bug legal table names containing spaces.

For posterity handling this kind of action inside the database (via
plpgsql/EXECUTE) in order to leverage some internal routines,
especially quote_ident(), is generally a good idea.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] efficiently migrating 'old' data from one table to another

2017-01-12 Thread Merlin Moncure

On Thu, Jan 12, 2017 at 2:19 PM, bto...@computer.org
 wrote:
>
>
> - Original Message -
>> From: "Jonathan Vanasco" 
>> To: "pgsql-general general" 
>> Sent: Thursday, January 12, 2017 3:06:14 PM
>> Subject: [GENERAL] efficiently migrating 'old' data from one table to another
>>
>> I'm just wondering if there's a more efficient way of handling a certain
>> periodic data migration.
>>
>> We have a pair of tables with this structure:
>>
>>   table_a__live
>>   column_1 INT
>>   column_2 INT
>>   record_timestamp TIMESTAMP
>>
>>   table_a__archive
>>   column_1 INT
>>   column_2 INT
>>   record_timestamp TIMESTAMP
>>
>> periodically, we must migrate items that are 'stale' from `table_a__live ` to
>> `table_a__archive`.  The entries are copied over to the archive, then
>> deleted.
>>
>> The staleness is calculated based on age--  so we need to use INTERVAL.  the
>> "live" table can have anywhere from 100k to 20MM records.
>>
>> the primary key on `table_a__live` is a composite of column_1 & column_2,
>>
>> In order to minimize scanning the table, we opted to hint migrations with a
>> dedicated column:
>>
>>   ALTER TABLE table_a__live ADD is_migrate BOOLEAN DEFAULT NULL;
>>   CREATE INDEX idx_table_a__live_migrate ON table_a__live(is_migrate) 
>> WHERE
>>   is_migrate IS NOT NULL;
>>
>> so our migration is then based on that `is_migrate` column:
>>
>>   BEGIN;
>>   UPDATE table_a__live SET is_migrate = TRUE WHERE record_timestamp <
>>   transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month';
>>   INSERT INTO table_a__archive (column_1, column_2, record_timestamp) 
>> SELECT
>>   column_1, column_2, record_timestamp FROM table_a__live WHERE 
>> is_migrate IS
>>   TRUE;
>>   DELETE FROM table_a__live WHERE is_migrate IS TRUE;
>>   COMMIT;
>>
>> The inserts & deletes are blazing fast, but the UPDATE is a bit slow from
>> postgres re-writing all the rows.
>>
>> can anyone suggest a better approach?
>>
>> I considered copying everything to a tmp table then inserting/deleting based
>> on that table -- but there's a lot of disk-io on that approach too.
>
>
> Review manual section 7.8.2. Data-Modifying Statements in WITH
>
>
> https://www.postgresql.org/docs/9.6/static/queries-with.html

this.

with data as (delete from foo where ... returning * ) insert into
foo_backup select * from data;


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries on very big table

2017-01-06 Thread Merlin Moncure

On Mon, Jan 2, 2017 at 5:23 AM, Job  wrote:
> Hello guys and very good new year to everybody!
>
> We are now approaching some queries and statistics on very big table (about 
> 180 millions of record).
> The table is partitioned by day (about ~3 Gb of data for every partition/day).
> We use Postgresql 9.6.1
>
> I am experiencing quite important slowdown on queries.
> I manually made a "vacuum full" and a "reindex" on every partition in order 
> to clean free space and reorder records.
>
> I have a BRIN index on timestamp and index on other field (btree)
>
> Starting by a simple query: explain analyze select count(domain) from 
> webtraffic_archive:
>
>   
>  QUERY PLAN
> 
>  Finalize Aggregate  (cost=3220451.94..3220451.95 rows=1 width=8) (actual 
> time=36912.624..36912.624 rows=1 loops=1)
>->  Gather  (cost=3220451.52..3220451.93 rows=4 width=8) (actual 
> time=36911.600..36912.614 rows=5 loops=1)
>  Workers Planned: 4
>  Workers Launched: 4
>  ->  Partial Aggregate  (cost=3219451.52..3219451.53 rows=1 width=8) 
> (actual time=36906.804..36906.804 rows=1 loops=5)
>->  Append  (cost=0.00..3094635.41 rows=49926443 width=0) 
> (actual time=4.716..31331.229 rows=39853988 loops=5)
>  ->  Parallel Seq Scan on webtraffic_archive  
> (cost=0.00..0.00 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_01_01  (cost=0.00..10.47 rows=47 width=0) (actual 
> time=0.000..0.000 rows=0 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_16  (cost=0.00..213728.26 rows=3498026 
> width=0) (actual time=4.713..2703.458 rows=2798421 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_17  (cost=0.00..201379.39 rows=3247739 
> width=0) (actual time=6.334..2364.726 rows=2598191 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_18  (cost=0.00..176248.86 rows=2824986 
> width=0) (actual time=7.437..2014.812 rows=2259989 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_19  (cost=0.00..177493.33 rows=2866433 
> width=0) (actual time=9.951..2145.958 rows=2293146 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_20  (cost=0.00..120271.83 rows=1960883 
> width=0) (actual time=0.011..372.092 rows=1568706 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_21  (cost=0.00..276391.94 rows=4485294 
> width=0) (actual time=5.386..3111.589 rows=3588235 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_22  (cost=0.00..287611.68 rows=4630668 
> width=0) (actual time=6.598..3335.834 rows=3704535 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_23  (cost=0.00..249047.61 rows=4014361 
> width=0) (actual time=7.206..2628.884 rows=3211489 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_24  (cost=0.00..192008.70 rows=3097370 
> width=0) (actual time=9.870..1882.826 rows=2477896 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_25  (cost=0.00..87385.16 rows=1405616 width=0) 
> (actual time=0.018..427.248 rows=1124493 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_26  (cost=0.00..88262.80 rows=1436080 width=0) 
> (actual time=0.014..277.327 rows=1148864 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_27  (cost=0.00..222607.43 rows=3557243 
> width=0) (actual time=8.497..1232.210 rows=2845795 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_28  (cost=0.00..210414.76 rows=3365676 
> width=0) (actual time=0.033..548.878 rows=2692541 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_29  (cost=0.00..185065.72 rows=2955872 
> width=0) (actual time=0.031..498.079 rows=2364697 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_30  (cost=0.00..149139.55 rows=2382656 
> width=0) (actual time=0.011..501.351 rows=1906124 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_31  (cost=0.00..166991.89 rows=2664288 
> width=0) (actual time=0.041..437.631 rows=2131431 loops=5)
>  ->  Parallel Seq Scan on 
> webtraffic_archive_day_2017_01_01  (cost=0.00..79197.29 rows=1260930 width=0) 
> (actual time=0.018..254.124 rows=1008744 loops=5)
>  ->  Parallel Seq Scan

Re: [GENERAL] Write-optimized data structures

2017-01-06 Thread Merlin Moncure

On Thu, Dec 29, 2016 at 4:03 PM, selforganized  wrote:
> Hi,
>
> Does Postgresql have any write-optimized data structure like LSM-tree? if
> not is there any plan to implement that?
>
> I'm building a write-heavy OLTP application. I'm looking at write-optimized
> databases like MyRocks, TokuDB, and Cassandra but I'd prefer to stay within
> Postgresql. Would PG makes sense for such usage?

The database world is unfortunately full of hype.  Some of it is real,
but buyer beware.  Are you sure the current postgres architecture
can't sustain your application?  If so, why?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL not reusing free space in table ?

2017-01-06 Thread Merlin Moncure

On Fri, Jan 6, 2017 at 4:09 AM, Pierre Ducroquet
 wrote:
> Hi
>
> Running PostgreSQL 9.4, I am running in the following issue.
> On a huge table, I have to remove the content (set to '') of a column that
> makes for 99% of the weight of the table. Since the production can not be
> stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of the
> table, so I tried the following trick to drop the content while reducing the
> table weight. (I re-indented it for this email)
>
> ppr=500
> for i in `seq 0 60` ; do
> psql $MYDB -c "UPDATE $TABLE
> SET content = ''::bytea
> WHERE ctid = ANY(ARRAY(
> SELECT ('(' || p.i || ',' || s.i || ')')::tid
> FROM generate_series(`$i * $ppr`, `($i + 1) * $ppr`) 
> p(i),
>  generate_series(0, 2048) AS s(i)
> ));"
> psql $MYDB -c "VACUUM $TABLE;"
> done
>
> This reduced my table from 1200MB down to 240MB, 29000 pages, while I expected
> the final result to be even lower. I did a copy of the table, and it needed
> only 30MB, 3800 pages, so there was still something wrong.
> I did a simple (and slow) query to get the number of rows per page on both
> table. The new one had slightly more rows per page, but this was in no way
> enough to justify such a gap. Then I saw that after page 2080 the table had,
> for the following 27000 pages, between 0 and 5 rows per page. So vacuum could
> not remove the pages because there were gaps.
> I figured out that, after a vacuum, updating the rows at the end of the table
> should be put back at the beginning of the free space and thus after an update
> of the rows of the last 27000 pages the table could be reduced to the optimal
> size.
> But when I raw the following, I was very surprised (NB: I know this query does
> a full scan, I won't do it on a 100GB table, I promise)
>
> VACUUM VERBOSE $TABLE;
>
> SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> => 29 rows
> UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)';
> => 29 rows updated
> SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> ==> 29 rows again !
>
> So instead of filling huge gaps (I've got for instance an 80 pages gap, from
> id 2082 to id 2160), Pg put the data back at the end of the table.
>
>
> I'm quite puzzled by this situation. I believed PostgreSQL would use the free
> space to put the new data, so my update loop would give me a clean, «packed»
> table.
>
> What behaviour did I miss here ? How can I get PostgreSQL to use that free
> space without falling back to a vacuum full ? (And without using tools like
> pg_repack if possible, because this seems like not needed in that situation)

hm.  Maybe HOT is involved?  Does your column have an index on it?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Row value expression much faster than equivalent OR clauses

2017-01-05 Thread Merlin Moncure

On Wed, Jan 4, 2017 at 8:49 AM, Kevin Grittner  wrote:
> On Wed, Dec 28, 2016 at 10:58 AM, Steven Grimm  wrote:
>
>> WHERE ((e.timeStamp > '2016-12-19T20:34:22.315Z')
>> OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
>> AND e.sequenceNumber > 0)
>> OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
>> AND e.sequenceNumber = 0
>> AND e.aggregateIdentifier >
>> 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'))
>
>> This uses the index on the three columns it's using for ordering of events,
>> but (if I'm reading the explain output correctly) does a full scan of the
>> index.
>
>> I played around with it a little and one thing I tried was to restructure
>> the WHERE clause using a row value expression that's semantically equivalent
>> to the original.
>
>> WHERE (e.timeStamp, e.sequenceNumber, e.aggregateIdentifier) >
>> ('2016-11-19T20:34:22.315Z', 0, 
>> 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')
>
>> This ends up being a LOT faster:
>
> Yup.
>
>> I wonder if the query planner could recognize that the two queries are
>> equivalent and choose the second plan for the OR-clause version, or at least
>> use the index more efficiently.
>
> Theoretically it could, but that would add significant time to
> planning for a large number of queries, with no benefit to those
> who explicitly write the query in the faster (and more concise!)
> fashion.
>
> You could come a lot closer to the performance of the row value
> expression technique by using the logical equivalent of your
> original query that puts AND at the higher level and OR at the
> lower level.  (Having OR at the top is generally inefficient.)
>
> WHERE (e.timeStamp >= '2016-12-19T20:34:22.315Z'
>   AND (e.timeStamp >  '2016-12-19T20:34:22.315Z'
>  OR (e.sequenceNumber >= 0
> AND (e.sequenceNumber >  0
>OR (e.aggregateIdentifier >
> 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')

In practice this can utilize the first part of the key only.  So
performance will be good if "timeStamp" is selective and poor if it
isn't.   In this query, I'd venture to guess it'd be pretty good.  The
row-wise comparison feature was built specifically to handle this type
of query and it's additionally much more concise as you noted, so I'd
agree; effort is better spent on client side education than on complex
rewriting strategies.

This type of query comes up a lot in applications that were converted from ISAM.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance PLV8 vs PLPGSQL

2017-01-04 Thread Merlin Moncure

On Wed, Dec 28, 2016 at 3:15 AM, Tim Uckun  wrote:
> I have seen various links on the internet which indicate that PLV8 is
> significantly faster than PL-PGSQL sometimes an order of magnitude faster.
>
> Is this uniformly true or is it just in certain circumstances?
>
> Is there any benefit to choosing PL-PGSQL?
>
> Is there work going on to make PL-PGSQL more performant or has it gotten
> significantly faster in the last two years or so (some of the links are a
> from a while ago).

Pavel covered it pretty well but I'll chime in also.  In typical usage
of pl/pgsql the overhead of the language itself is minimal.  Most of
the heavy lifting is done by the SQL engine. pl/pgsql is glue that
sticks SQL statements together basically.  Heavy computation in
pl/pgsql can be problematic but is rarely necessary.  pl/v8 is
definitively a better choice for such cases but not being in core can
be an issue for some people.

The advantage of pl/pgsql is that SQL statements, types and errors are
native to the language itself.  This makes many things much easier to
do...pl/pgsql is maybe the most efficient data processing language (in
terms of efficiency of coding) in existence.   That may smack of
overstatement but my entire career is built around being good at it so
I'm inclined to effusiveness :-D.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] storing C binary array in bytea via libpq

2016-12-09 Thread Merlin Moncure

On Wed, Dec 7, 2016 at 4:10 AM, Tom DalPozzo  wrote:
> Hi,
> I tried both ways: they're ok.
> Also, multiple VALUES in one INSERT is actually better as performance.

If you are chasing performance with the binary protocol you might want
to take a look at libpqtypes: http://libpqtypes.esilo.com/

It facilitates fast communication to/from the database.  For
inserting, you stack an array of composites locally and send it to the
database with a single query and the receiving side can unwrap it and
do the insert in a function.   Advancements in json handling have
largely displaced this kind of usage in many situations but if you
have extreme performance requirements it's still worth a look.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] When to use COMMENT vs --

2016-12-09 Thread Merlin Moncure

On Wed, Dec 7, 2016 at 9:57 AM, Rich Shepard  wrote:
>   I have used '-- ' to enter comments about tables or columns and am curious
> about the value of storing comments in tables using the COMMENT key word.
> When is the latter more appropriate than the former?

Main advantage of COMMENT is that the comments are transferred to the
database such that the comments will be made available to client tools
(like psql) and external programs.  For example, if you are generating
database diagrams out of the database automatically (which is a good
idea) with a good tool (I recommend SchemaSpy) the comments will
become visible.

Code style comments OTOH will decorate the file.  This is good if you
maintain your code as proper code, checking it into git and such,
which is also a very good idea.

Personally I tend to avoid COMMENT on aesthetics; I just dislike the
COMMENT section to *after* the object being created in code. If I had
a hypothetical "COMMENT ON NEXT OBJECT 'This is ...';" I would
probably use COMMENT a lot more :-).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] About the MONEY type

2016-11-30 Thread Merlin Moncure

On Wed, Nov 30, 2016 at 2:16 PM, John McKown
 wrote:
> On Wed, Nov 30, 2016 at 1:23 PM, bto...@computer.org
>  wrote:
> Speaking generically, I guess maybe MONEY needs to be somewhat like a
> TIMESTAMP. At least in PostgreSQL, a TIMESTAMP can contain a TIMEZONE. I
> guess a MONEY type should contain a modifier identifying the issuer of the
> currency (E.g. U.S. Dollar vs Canadian Dollar vs. Yen vs. Yuan vs. "precious
> metal").

ISTM we already have that functionality; composite types.  Had the
money type been written after we got composite types it might have
been done differently (or perhaps not at all).   A similar observation
can be made against the geometric types.

Proper currency conversion of course is a complex topic; it'd be an
interesting thought experiment to imagine that functionality inside of
a type implementation.

The problem with the money type is that it simultaneously somehow does
too much and not enough.  It kind of lives in twilight as a sneaky
fixed point integer implemented in binary.  It's a scar from the heady
days of youth used to impress people :-).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] select function alias

2016-11-30 Thread Merlin Moncure

On Wed, Nov 30, 2016 at 12:15 PM, David G. Johnston
 wrote:
> On Wed, Nov 30, 2016 at 8:15 AM, bto...@computer.org
>  wrote:
>>
>> 5. Use a CTE:
>>
>> with shortnames as (
>> select to_char(impressions_create_date,'-mm-dd') as ymd from
>> impressionsdb
>> ) select ymd from shortnames where ymd like '2016-11%';
>>
>
> Except that WHERE clause won't get pushed down (I think...haven't checked an
> explain recently) so a sub-select is definitely better in this scenario.

It's still the case FWICT.  I suspect CTE optimization fencing is
mostly considered a feature, not a bug.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Index is not used for "IN (non-correlated subquery)"

2016-11-30 Thread Merlin Moncure

On Wed, Nov 30, 2016 at 11:05 AM, George <pinkisntw...@gmail.com> wrote:
> On Wed, Nov 30, 2016 at 6:45 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Wed, Nov 30, 2016 at 10:42 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> George <pinkisntw...@gmail.com> writes:
>>>> explain select * from wg3ppbm_transaction where partner_uuid in (
>>>> select p.uuid
>>>> from wg3ppbm_userpartner up
>>>> join wg3ppbm_partner p on p.id = up.partner_id
>>>> );
>>>
>>>> "Hash Semi Join  (cost=2.07..425.72 rows=2960 width=482)"
>>>> "  Hash Cond: ((wg3ppbm_transaction.partner_uuid)::text = (p.uuid)::text)"
>>>> "  ->  Seq Scan on wg3ppbm_transaction  (cost=0.00..375.19 rows=5919 
>>>> width=482)"
>>>> "  ->  Hash  (cost=2.06..2.06 rows=1 width=37)"
>>>> "->  Nested Loop  (cost=0.00..2.06 rows=1 width=37)"
>>>> "  Join Filter: (up.partner_id = p.id)"
>>>> "  ->  Seq Scan on wg3ppbm_userpartner up
>>>> (cost=0.00..1.01 rows=1 width=4)"
>>>> "  ->  Seq Scan on wg3ppbm_partner p  (cost=0.00..1.02
>>>> rows=2 width=41)"
>>>
>>> This plan is expecting to have to return about half of the rows in
>>> wg3ppbm_transaction, a situation for which an indexscan would NOT
>>> be a better choice.  The usual rule of thumb is that you need to be
>>> retrieving at most one or two percent of a table's rows for an indexscan
>>> on it to be faster than a seqscan.
>>>
>>> I think however that the "half" may be a default estimate occasioned
>>> by the other tables being empty and therefore not having any statistics.
>>> Another rule of thumb is that the plans you get for tiny tables have
>>> little to do with what happens once there's lots of data.
>>
>> Yeah, don't make query plan assumptions against empty or nearly empty
>> tables.  As the data grows, the plans will suitably change.  Perhaps
>> OP just recently loaded a bunch of data and the tables haven't been
>> analyzed yet?
>
> I just added a significant number of rows to the table. I now have
> 1.3M rows in total but only 8K rows that contain the value I am
> seeking. I also ran ANALYZE after loading the data. The query plans
> for the two queries did not change. Also, the simple query returns in
> 45 ms while the one with the subquery needs 1.5 s, i.e. it is about
> 30x slower.
>
> So there is definitely something wrong here. This situation makes many
> row-level security use cases cumbersome since you need to have
> almost the same WHERE clause both in the row-level security policy and
> in every SELECT query in order for the index to be used.

can you give EXPLAIN ANALYZE for the 'good' query and the 'bad' query?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Index is not used for "IN (non-correlated subquery)"

2016-11-30 Thread Merlin Moncure

On Wed, Nov 30, 2016 at 10:42 AM, Tom Lane  wrote:
> George  writes:
>> explain select * from wg3ppbm_transaction where partner_uuid in (
>> select p.uuid
>> from wg3ppbm_userpartner up
>> join wg3ppbm_partner p on p.id = up.partner_id
>> );
>
>> "Hash Semi Join  (cost=2.07..425.72 rows=2960 width=482)"
>> "  Hash Cond: ((wg3ppbm_transaction.partner_uuid)::text = (p.uuid)::text)"
>> "  ->  Seq Scan on wg3ppbm_transaction  (cost=0.00..375.19 rows=5919 
>> width=482)"
>> "  ->  Hash  (cost=2.06..2.06 rows=1 width=37)"
>> "->  Nested Loop  (cost=0.00..2.06 rows=1 width=37)"
>> "  Join Filter: (up.partner_id = p.id)"
>> "  ->  Seq Scan on wg3ppbm_userpartner up
>> (cost=0.00..1.01 rows=1 width=4)"
>> "  ->  Seq Scan on wg3ppbm_partner p  (cost=0.00..1.02
>> rows=2 width=41)"
>
> This plan is expecting to have to return about half of the rows in
> wg3ppbm_transaction, a situation for which an indexscan would NOT
> be a better choice.  The usual rule of thumb is that you need to be
> retrieving at most one or two percent of a table's rows for an indexscan
> on it to be faster than a seqscan.
>
> I think however that the "half" may be a default estimate occasioned
> by the other tables being empty and therefore not having any statistics.
> Another rule of thumb is that the plans you get for tiny tables have
> little to do with what happens once there's lots of data.

Yeah, don't make query plan assumptions against empty or nearly empty
tables.  As the data grows, the plans will suitably change.  Perhaps
OP just recently loaded a bunch of data and the tables haven't been
analyzed yet?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] About the MONEY type

2016-11-30 Thread Merlin Moncure

On Wed, Nov 30, 2016 at 7:43 AM, Raymond O'Donnell  wrote:
> On 30/11/16 12:05, Thomas Kellerer wrote:
>>
>> Tobia Conforto schrieb am 30.11.2016 um 12:15:
>>>
>>> I think MONEY is a great datatype, at least in theory.
>>
>>
>> I personally find it pretty useless to be honest - especially because
>> the currency symbol depends on the client.
>>
>> So if I store a money value in the database, some clients see CHF,
>> some see Kč, some see £ and others might see € - all see the same
>> amount. Which seems totally wrong because 10€ is something completely
>> different then 10Kč or 10£.
>>
>> Plus: inside a programming language (e.g. Java/JDBC) it's hard to
>> work with the values because the database sends the values as a
>> string (it has to because of the currency symbol) but in reality it
>> is a number - but you can't just convert the String to a number again
>> because of the symbol.
>>
>> So I always recommend to not use it (in Postgres just as well as in
>> other DBMS, e.g. SQL Server)
>
>
> I seem to remember that it was actually deprecated at some point - this is
> going back quite a few years. This was later reversed, though I don't know
> why.

It was moved from a 32 bit implementation to a 64 bit one, and it was
given a lot of the previously missing basic infrastructure that was
needed to do normal simple things.  I would personally have preferred
to get rid it for the problems you mentioned.

With respect to FIXED, the NUMERIC datatype is already fixed precision
so the name is poor.  An all binary fixed precision datatype would be
a good idea for an extension, assuming one does not already exist (I
didn't check).

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] "Fuzzy" Matches on Nicknames

2016-11-30 Thread Merlin Moncure

On Tue, Nov 29, 2016 at 6:56 PM, rob stone  wrote:
> Hello Michael,
> On Tue, 2016-11-29 at 19:10 -0500, Michael Sheaver wrote:
>> Greetings,
>>
>> I have two tables that are populated using large datasets from
>> disparate external systems, and I am trying to match records by
>> customer name between these two tables. I do not have any
>> authoritative key, such as customerID or nationalID, by which I can
>> match them up, and I have found many cases where the same customer
>> has different first names in the two datasets. A sampling of the
>> differences is as follows:
>>
>> Michael <=> Mike
>> Tom <=> Thomas
>> Liz <=> Elizabeth
>> Margaret <=> Maggie
>>
>> How can I build a query in PostgreSQL (v. 9.6) that will find
>> possible matches like these on nicknames? My initial guess is that I
>> would have to either find or build some sort of intermediary table
>> that contains associated names like those above. Sometimes though,
>> there will be more than matching pairs, like:
>>
>> Jim <=> James <=> Jimmy <=> Jimmie
>> Bill <=> Will <=> Willie <=> William
>>
>> and so forth.
>>
>> Has anyone used or developed PostgreSQL queries that will find
>> matches like these? I am running all my database queries. on my local
>> laptops (Win7 and macOS), so performance or uptime is no issue here.
>> I am curious to see how others in this community have creatively
>> solved this common problem.
>>
>> One of the PostgreSQL dictionaries (synonym, thesaurus etc.) might
>> work here, but honestly I am clueless as to how to set this up or use
>> it in queries successfully.
>>
>> Thanks,
>> Michael (aka Mike, aka Mikey)
>>
>
> Check out chapter F15 in the doco.
> Try the double metaphone.
> I worked on something similar many years ago cleaning up input created
> by data entry clerks from hand written speeding tickets, so as to match
> with "trusted" data held in a database.
> As the volume of input was small in comparison with the number of
> licensed drivers, we could iterate over and over again trying to match
> it up.

Also check out pg_trgm extension.  It's better for addresses than
names, but might be something to look at depending on how things turn
up with the data.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Change column type from int to bigint - quickest way

2016-11-16 Thread Merlin Moncure

On Fri, Nov 11, 2016 at 9:30 AM, Andreas Brandl  wrote:
> Hi,
>
> we have a pretty big table with an integer-type primary key. I'm looking for 
> the quickest way to change the column type to bigint to avoid hitting the 
> integer limit. We're trying to avoid prolonged lock situations and full table 
> rewrites.
>
> I know I can hack this with an UPDATE on pg_attribute:
>
> -- change id type to bigint
> update pg_attribute set atttypid=20 where attrelid=264782 and attname = 'id';
>
> After that I'd need to reflect the change on dependent objects like views as 
> well.
>
> Is this safe to do? Are there any unwanted consequences to this?
>
> This is still on 9.1 unfortunately - upgrade is going to follow soon after 
> this.

You just posted the same question a few days ago -- were the answers
there unsatisfactory?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Change column type from int to bigint - quickest way

2016-11-11 Thread Merlin Moncure

On Friday, November 11, 2016, Andreas Brandl  wrote:

> Hi,
>
> we have a pretty big table with an integer-type primary key. I'm looking
> for the quickest way to change the column type to bigint to avoid hitting
> the integer limit. We're trying to avoid prolonged lock situations and full
> table rewrites.
>
> I know I can hack this with an UPDATE on pg_attribute:
>
> -- change id type to bigint
> update pg_attribute set atttypid=20 where attrelid=264782 and attname =
> 'id';
>
> After that I'd need to reflect the change on dependent objects like views
> as well.
>
> Is this safe to do? Are there any unwanted consequences to this?
>
> This is still on 9.1 unfortunately - upgrade is going to follow soon after
> this.
>
> Thanks!
> Andreas
>

Hm.  just thinking out loud:

How about making a new column without default that is bigint and updated
via trigger.  then you can over time update the table row by row in batches
over many transactions to initialize the id.  once completely set you can
do a swap with some carefully written and tested ddl  that will exchange
out the name and any dependent objects such as ri triggers.

The exchanging step ought to be quick.  you may have to temporarily disable
ri checks to keep things running smooth.

merlin

Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-11-07 Thread Merlin Moncure

On Fri, Nov 4, 2016 at 9:38 AM, Alban Hertroys <haram...@gmail.com> wrote:
> On 4 November 2016 at 14:41, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Fri, Nov 4, 2016 at 8:08 AM, Kim Rose Carlsen <k...@hiper.dk> wrote:
>>> The nulls are generated by something like this
>>> SELECT c.circuit_id,
>>>cc.customer_id
>>>FROM circuit AS c
>>> LEFT JOIN circuit_customer AS cc
>>>  ON c.circuit_id = cc.circuit_id
>>>
>>> To make a magic '0' customer we would be required to use
>>>   COALESCE(cc.customer_id, '0')
>>> I dont think the optimizer will do anything clever with the '0' we have
>>> computed from null.
>>
>> It would if you explicitly indexed it as such;
>> CREATE INDEX ON circuit_customer((COALESCE(customer_id, '0'));
>
> Merlin, it's a LEFT JOIN. There probably are no NULLs in the
> circuit_customer.customer_id column, so that COALESCE isn't going to
> achieve anything at all.

Hang on -- upthread the context was inner join, and the gripe was join
fast with '=', slow with INDF.  When he said the nulls were
'generated', I didn't follow that they were part of the original
query.  If the nulls are generated along with the query, sure, an
index won't help.

I maintain my earlier point; with respect to the original query, to
get from performance of INDF to =, you have three options:
a) expr index the nulls  (assuming they are physically stored)
b) convert to ((a = b) or a is null and b is null) which can help with
a bitmap or plan
c) covert to union all equivalent of "b"

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-11-04 Thread Merlin Moncure

On Fri, Nov 4, 2016 at 8:08 AM, Kim Rose Carlsen  wrote:
>>> It might raise another problem, that the nulls are generated through LEFT
>
>>> JOINS where no rows are defined. Then the 0 or -1 value need to be
>>> a computed value. Won't this throw off index lookups? (I might be
>>> more confused in this area).
>>
>>Not following this.
>
> The nulls are generated by something like this
> SELECT c.circuit_id,
>cc.customer_id
>FROM circuit AS c
> LEFT JOIN circuit_customer AS cc
>  ON c.circuit_id = cc.circuit_id
>
> To make a magic '0' customer we would be required to use
>   COALESCE(cc.customer_id, '0')
> I dont think the optimizer will do anything clever with the '0' we have
> computed from null.

It would if you explicitly indexed it as such;
CREATE INDEX ON circuit_customer((COALESCE(customer_id, '0'));

> I could ofc. by default assign all unassigned circuits to '0' in
> circuit_customer. I'm not a fan though.

hm, why not?  null generally means 'unknown' and that's why it fails
any equality test.

>>BTW, if you want a fast plan over the current
>>data without consideration of aesthetics, try this:
>>
>>CREATE VIEW view_circuit_with_status AS (
>>SELECT r.*,
>>  s.circuit_status,
>>  s.customer_id AS s_customer_id,
>>  p.line_speed,
>>  p.customer_id AS p_customer_id
>> FROM view_circuit r
>> JOIN view_circuit_product_main s
>>  ON r.circuit_id = s.circuit_id
>>  AND r.customer_id, s.customer_id
>> JOIN view_circuit_product p
>>   ON r.circuit_id = p.circuit_id
>>  AND r.customer_id, s.customer_id
>>  UNION ALL SELECT r.*,
>>  s.circuit_status,
>>  s.customer_id AS s_customer_id,
>>  p.line_speed,
>>  p.customer_id AS p_customer_id
>> FROM view_circuit r
>> JOIN view_circuit_product_main s
>>   ON r.circuit_id = s.circuit_id
>>  AND r.customer_id IS NULL
>>  AND  s.customer_id IS NULL
>> JOIN view_circuit_product p
>>   ON r.circuit_id = p.circuit_id>
>
> I will have to figure something out, but this specific case is still
> problematic
> since we would like to filter this view using different criteria's, like
> circuit_no,
> products or customers.

the above is logically equivalent to IS NOT DISTINCT FROM; you should
be able to query it as you would have done the original view.

> But with all these detours, I assume that a change to IS NOT DISTINCT FROM,
> is difficult or not wanted?

Well, not exactly.  In your case you are trying to treat null as a
specific value and pass it through join operations.

TBH, this is a pretty dubious approach: null is not supposed to be
equal to anything and any join vs null should come up empty --
logically at least.  INDF works around this of course but it's not a
recommended approach (my usage is generally restricted to, "has this
value changed since yesterday? etc").

I'm not an expert backend structures for indexing and optimization but
I know enough to suspect that optimizing INDF might cause
implementation headaches in various places, as do other irregular
syntactical approaches in SQL.  I think minimally optimizing INDF
would require converting it to an operator on par with '=' which is a
pretty large infrastructure change for an edge optimization case.
The fact that there are solid optimization strategies already on the
table (UNION ALL, expr index COALESCE()) does not help.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-11-01 Thread Merlin Moncure

On Mon, Oct 31, 2016 at 9:28 AM, Kim Rose Carlsen  wrote:
> On Sat, Oct 29, 2016 at 8:27 AM, Kim Rose Carlsen  wrote:
>
>> > I have tried creating a function called
>> > zero_if_null(int) : int that just select COALESCE($1, 0)
>> > and adding a index on (zero_if_null(customer_id)) on table that contains
>> > customer_id. The only thing I get from is the planner now only knows how
>> > to
>> > compare customer_id, but it still doesn't know that they are of same
>> > value,
>> > only I know that and I want to declare it for the planner.
>
>
>> Well, the *behavior* is mandated by the sql standard.  Our
>> implementation is slow however.
>
> Sorry I'm not following, what behavior is mandated by the sql standard?

The semantics of IS DISTINCT FROM, basically, equality with special
consideration for nulls.

> But I'm not sure I can mentally accept an unfilled value should not be
> null (eg. 0, '', '-01-01'). But I can see how the equals operator will
> work well with this.

Yeah, this is a dubious tactic and I would normally only consider
using it for surrogate identifiers.

> It might raise another problem, that the nulls are generated through LEFT
> JOINS where now rows are defined. Then the 0 or -1 value need to be
> a computed value. Won't this throw of index lookups? (I might be
> more confused in this area).

Not following this.  BTW, if you want a fast plan over the current
data without consideration of aesthetics, try this:

CREATE VIEW view_circuit_with_status AS (
   SELECT r.*,
  s.circuit_status,
  s.customer_id AS s_customer_id,
  p.line_speed,
  p.customer_id AS p_customer_id
 FROM view_circuit r
 JOIN view_circuit_product_main s
   ON r.circuit_id = s.circuit_id
  AND r.customer_id, s.customer_id
 JOIN view_circuit_product p
   ON r.circuit_id = p.circuit_id
  AND r.customer_id, s.customer_id
  UNION ALL SELECT r.*,
  s.circuit_status,
  s.customer_id AS s_customer_id,
  p.line_speed,
  p.customer_id AS p_customer_id
 FROM view_circuit r
 JOIN view_circuit_product_main s
   ON r.circuit_id = s.circuit_id
  AND r.customer_id IS NULL
  AND  s.customer_id IS NULL
 JOIN view_circuit_product p
   ON r.circuit_id = p.circuit_id


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-31 Thread Merlin Moncure

On Sat, Oct 29, 2016 at 8:27 AM, Kim Rose Carlsen  wrote:
>>> This doesn't do much good. This doesn't tell the planner that the 3
>
>>> customer_ids are actually of same value, and it therefore can't filter
>>> them
>>> as it sees fit.
>
>> You do know you can index on a function, and the planner then keeps
>> stats on it when you run analyze right?
>
> Yes, but I don't think it will make any difference. I don't think I can
> solve this with
> an index lookup. I think my savior is the inference that the 2 columns are
> of
> same value and the planner are free to choose which order to do the filter
> and join
> with this extra information.
>
> I have tried creating a function called
> zero_if_null(int) : int that just select COALESCE($1, 0)
> and adding a index on (zero_if_null(customer_id)) on table that contains
> customer_id. The only thing I get from is the planner now only knows how to
> compare customer_id, but it still doesn't know that they are of same value,
> only I know that and I want to declare it for the planner.
>
> I could probably rewrite the whole view in one query, and then fix it with a
> proper index. But I think I will loose alot of readability.
>
> I could also change the structure to save an explicit state, instead of a
> calculated state. But then I get some redundancy I need to make sure always
> stays the same.
>
> In the end one of these will probably be the solution.
>
> I guess the question is more or less,
>
> why doesn't IS NOT DISTINCT FROM behave the same way as = operator, are
> there any alternatives? And a plausible use case for when it would be
> useful.

Well, the *behavior* is mandated by the sql standard.  Our
implementation is slow however.  I'm surprised the attached function
didn't help, it can be inlined and I was able to get bitmap or which
is pretty good.  As I said upthread I think INDF could theoretically
run as fast as equality -- it just doesn't today.

As your joins are written you could probably convert this by reserving
a customer_id to the work that you're trying to do with null, say, 0,
or -1.  This is a somewhat dubious practice but seems a better fit for
your use case.  I don't think INDF is good in this usage.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-28 Thread Merlin Moncure

On Fri, Oct 28, 2016 at 2:54 PM, Kim Rose Carlsen  wrote:
>> > On Fri, Oct 28, 2016 at 10:29 AM, Kim Rose Carlsen  wrote:
>
>> > Hi
>> >
>> > I was wondering if there is a way to hint that two columns in two
>> > different
>> > tables IS NOT DISTINCT FROM each other. So that the optimizer may assume
>> > if
>> > table_a.key = 'test' THEN table_b.key = 'test' .
>> >
>> > The equals operator already does this but it does not handle NULLS very
>> > well
>> > (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and
>> > doesn't establish the same inference rules as equals.
>>
>> The whole idea behing Postgres' query planner is that you don't have
>> to use any hints. Late model versions of postgres handle nulls fine,
>> but nulls are never "equal" to anything else. I.e. where xxx is null
>> works with indexes. Where x=y does not, since null <> null.
>>
>> Suggestion for getting help, put a large-ish aka production sized
>> amount of data into your db, run your queries with explain analyze and
>> feed them to https://explain.depesz.com/ and post the links here along
>> with the slow queries. A lot of times the fix is non-obvious if you're
>> coming from another db with a different set of troubleshooting skills
>> for slow queries.
>
> The problem is how to reduce the problem into its core, without introducing
> all the unnecessary.
>
> Maybe simplifying the problem, also makes it impossible to say where I go
> wrong. It might be that I try to push too much logic into the SQL layer
> and Im adding too many layers of abstraction to accomplish what I want.
> So let me try and elaborate a little more.
>
> I have couple a tables describing resources (circuits) and allocation
> of resources to customers and products.
>
> First layer is a view called view_circuit. This view (left) join any table
> the circuit table reference through a foreign key (it gives exactly the same
> rows and columns as circuit table + some extra information like
> customer_id).
>
> Second layer is 2 views
> 1) a view describing if the circuit is active or inactive, lets call it
>view_circuit_product_main
> 2) a view describing line_speed about the circuit, lets call it
>view_circuit_product
>
> These views use aggregations (both GROUP BY and SELECT DISTINCT ON (...))
> if this has any relevance.
>
> Third layer
> Next step is to add a view that tells both (joins the two views together
> on circuit_id). lets call the new view view_circuit_with_status
>
> This view is defined as
>
> CREATE VIEW view_circuit_with_status AS (
>SELECT r.*,
>   s.circuit_status,
>   s.customer_id AS s_customer_id,
>   p.line_speed,
>   p.customer_id AS p_customer_id
>  FROM view_circuit r
>  JOIN view_circuit_product_main s
>ON r.circuit_id = s.circuit_id
>   AND r.customer_id IS NOT DISTINCT FROM s.customer_id
>  JOIN view_circuit_product p
>ON r.circuit_id = p.circuit_id
>   AND r.customer_id IS NOT DISTINCT FROM s.customer_id
> );
>
> SELECT * FROM view_circuit_with_status WHERE customer_id = 1;
>
> Since customer_id is exposed through view_circuit the planner assumes
> view_circuit.customer_id = 1 and from there attempts to join
> view_circuit_product_main and view_circuit_product using circuit_id.
> This is not running optimal.
>
> However if we change our query to allow the inference rule to take place,
> the query is executed very fast.
>
> SELECT * FROM view_circuit_with_status WHERE customer_id = 1 AND
> s_customer_id = 1 AND p_customer_id = 1;
>
> If a circuit is not assigned to any customers customer_id is set to NULL.
> This is the reason I can't use = operator. If I do use = then I can't find
> circuit which are unassigned, but the query do run effective.
>
> I can see this still ends up being quite abstract, but the point is it would
> be quite beneficial if IS NOT DISTINCT used the same rules as = operator.
>
> I have attached the 2 query plans
>
> Bad plan: https://explain.depesz.com/s/SZN
> Good plan: https://explain.depesz.com/s/61Ro

try this :-D
create or replace function indf(anyelement, anyelement) returns anyelement as
$$
  select $1 = $2 or ($1 is null and $2 is null);
$$ language sql;

CREATE VIEW view_circuit_with_status AS (
   SELECT r.*,
  s.circuit_status,
  s.customer_id AS s_customer_id,
  p.line_speed,
  p.customer_id AS p_customer_id
 FROM view_circuit r
 JOIN view_circuit_product_main s
   ON r.circuit_id = s.circuit_id
  AND indf(r.customer_id, s.customer_id)
 JOIN view_circuit_product p
   ON r.circuit_id = p.circuit_id
  AND indf(r.customer_id, s.customer_id)

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-28 Thread Merlin Moncure

On Fri, Oct 28, 2016 at 1:20 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Fri, Oct 28, 2016 at 1:17 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Scott Marlowe <scott.marl...@gmail.com> writes:
>>> On Fri, Oct 28, 2016 at 10:29 AM, Kim Rose Carlsen <k...@hiper.dk> wrote:
>>>> I was wondering if there is a way to hint that two columns in two different
>>>> tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if
>>>> table_a.key = 'test' THEN table_b.key = 'test' .
>>>>
>>>> The equals operator already does this but it does not handle NULLS very 
>>>> well
>>>> (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and
>>>> doesn't establish the same inference rules as equals.
>>
>>> The whole idea behing Postgres' query planner is that you don't have
>>> to use any hints. Late model versions of postgres handle nulls fine,
>>> but nulls are never "equal" to anything else. I.e. where xxx is null
>>> works with indexes. Where x=y does not, since null <> null.
>>
>> The bigger picture here is that if you've designed a data representation
>> that requires that a null be considered "equal to" another null, you're
>> really going to be fighting against the basic semantics of SQL.  You'd
>> be best off to rethink the representation.  We've not seen enough info
>> about your requirements to suggest just how, though.
>
> Well, ISTM OP is simply asking, "why can't IS [NOT] DISTINCT FROM be
> optimized"?  It ought to be, at least in some cases. Internally
> indexes handle nulls so you should be able to implement them to
> satisfy those kinds of scans.  I guess that's an easy thing to say
> though.

hm. ISTM "a IS NOT DISTINCT FROM b"...could be rewritten into

((a IS NULL AND b IS NULL) OR (a = b)) ... which is an indexable expression.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-28 Thread Merlin Moncure

On Fri, Oct 28, 2016 at 1:17 PM, Tom Lane  wrote:
> Scott Marlowe  writes:
>> On Fri, Oct 28, 2016 at 10:29 AM, Kim Rose Carlsen  wrote:
>>> I was wondering if there is a way to hint that two columns in two different
>>> tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if
>>> table_a.key = 'test' THEN table_b.key = 'test' .
>>>
>>> The equals operator already does this but it does not handle NULLS very well
>>> (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and
>>> doesn't establish the same inference rules as equals.
>
>> The whole idea behing Postgres' query planner is that you don't have
>> to use any hints. Late model versions of postgres handle nulls fine,
>> but nulls are never "equal" to anything else. I.e. where xxx is null
>> works with indexes. Where x=y does not, since null <> null.
>
> The bigger picture here is that if you've designed a data representation
> that requires that a null be considered "equal to" another null, you're
> really going to be fighting against the basic semantics of SQL.  You'd
> be best off to rethink the representation.  We've not seen enough info
> about your requirements to suggest just how, though.

Well, ISTM OP is simply asking, "why can't IS [NOT] DISTINCT FROM be
optimized"?  It ought to be, at least in some cases. Internally
indexes handle nulls so you should be able to implement them to
satisfy those kinds of scans.  I guess that's an easy thing to say
though.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] libpq heartbeat

2016-10-27 Thread Merlin Moncure

On Thu, Oct 27, 2016 at 12:45 PM, Francisco Olarte
<fola...@peoplecall.com> wrote:
> Merlin:
>
> On Thu, Oct 27, 2016 at 7:29 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Thu, Oct 27, 2016 at 11:18 AM, Francisco Olarte
>> <fola...@peoplecall.com> wrote:
>>> It is, but handling them is not easy, and you have to deal with things
>>> like DoS which are not trivial on the server ( as it is a heavy
>>> service ). It can be done, and sometimes needs to be done, but is not
>>> a thing to take over lightly.
>>>> This could be over ssh tunnel for example.
>>> In which case it is NOT exposed to the internet. What are you trying to say?
>>
>> what?   ssh can most certainly convey over the internet.   I said ssh
>> *tunnel*; not ssh.   With tunneling the ssh endpoint is the client
>> application.   When I built a libpq based intenet facing application
>> we used a modified pgbouncer to whitelist the parameterized query
>> strings and to force the auth.  We had zero issues.
>
> I'm not a native English speaker, so I have some problem understanding
> the finer details
>
> I said libpq service/protocols are tricky to put on the internet.
>
> You replied, among other things, it could be over an ssh tunnel ( I
> use ssh tunnels continuously, to the point I routinely open/close them
> on live connections via escape, so I know  they are ).

right -- we may be talking past each other.   You originally said,
'across a firewall', not 'exposed to the internet'.  I agree that you
should not expose a database directly to the internet without very
carefully considering the ramifications, but there are many scenarios
where you would cross a firewall where the suggestions advised here
(tcp keepalives) are useful and good to do.

In our corporate intranet here we have overzealous firewalls which
causes major problems with many applications stacks, particularly
java.  Aggressive keepalives deals with such cases pretty well.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] libpq heartbeat

2016-10-27 Thread Merlin Moncure

On Thu, Oct 27, 2016 at 11:18 AM, Francisco Olarte
<fola...@peoplecall.com> wrote:
> Merlin:
>
> On Thu, Oct 27, 2016 at 6:10 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Thu, Oct 27, 2016 at 10:01 AM, Francisco Olarte
>> <fola...@peoplecall.com> wrote:
>>> And I'd like to point libpq sessions does not sound to be the best
>>> kind of traffic across a firewall, not a good service / protocol to
>>> expose.
>
>> meh -- it's perfectly fine to expose postgres to the internet as long
>> as you've handled the security concerns.
>
> It is, but handling them is not easy, and you have to deal with things
> like DoS which are not trivial on the server ( as it is a heavy
> service ). It can be done, and sometimes needs to be done, but is not
> a thing to take over lightly.
>
>> This could be over ssh tunnel for example.
>
> In which case it is NOT exposed to the internet. What are you trying to say?

what?   ssh can most certainly convey over the internet.   I said ssh
*tunnel*; not ssh.   With tunneling the ssh endpoint is the client
application.   When I built a libpq based intenet facing application
we used a modified pgbouncer to whitelist the parameterized query
strings and to force the auth.  We had zero issues.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] libpq heartbeat

2016-10-27 Thread Merlin Moncure

On Thu, Oct 27, 2016 at 10:01 AM, Francisco Olarte
 wrote:
> And I'd like to point libpq sessions does not sound to be the best
> kind of traffic across a firewall, not a good service / protocol to
> expose.

meh -- it's perfectly fine to expose postgres to the internet as long
as you've handled the security concerns.   This could be over ssh
tunnel for example.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] json rendering without pretty option (compact)

2016-10-19 Thread Merlin Moncure

On Tue, Oct 18, 2016 at 12:00 PM, Nicolas Paris  wrote:
> Hello,
>
> I want to minimize postgresql json size when I fetch them.
> I translate columnar table to json thought json_build_object/array or even
> row_to_jeon.
>
> While row_to_json do have a "pretty_bool" option, the latter do not. Each
> json object/array I build contains spaces.
>
> Is there a workaround ?
> Is there a plan to add this feature on future version ?

There is gradual movement towards having all the generation functions
being completely compact.   Personally I view the pretty flag as being
a legacy setting formatting should be handled by a separate routine,
which would be more expensive but much more flexible.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] custom average window function failure

2016-10-10 Thread Merlin Moncure

On Sun, Oct 9, 2016 at 10:10 PM, Sebastian P. Luque  wrote:
> On Sun, 09 Oct 2016 16:00:21 -0400,
> Tom Lane  wrote:
>
>> "Sebastian P. Luque"  writes:
>>> Tom Lane  wrote:
 On closer inspection, the error is only in the
 aggregate-used-as-window-function case, not plain aggregation.
>
>>> Yes, I see the same phenomenon.  Could someone suggest a workaround
>>> until this is fixed?  I'm under the gun to submit output tables and
>>> the only thing I can think of is a crawling slow loop to step through
>>> each window twice: once using the plain aggregation and another
>>> without just get all rows.  I highly doubt it will be worthwhile,
>>> given it's going to be about 1000 iterations, and each one would take
>>> about 30-45 min...
>
>> Are you in a position to apply patches?  It's a one-line fix:
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=dca25c2562199ce1e7e26367613912a8eadbbde8
>
>> Alternatively, the problem doesn't manifest when the aggregate
>> transtype and output type are the same, so you could probably refactor
>> your code to use plain array_agg and apply the finalfunc separately in
>> the SQL query.
>
> Perfect, I'll try the latter option on this one.  Thanks so much to both
> of you for your prompt feedback!

Aside: nice use of custom aggregates through window functions.  I use
this tactic heavily.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Transactional-DDL DROP/CREATE TABLE

2016-10-10 Thread Merlin Moncure

On Thu, Oct 6, 2016 at 4:21 AM, Geoff Winkless  wrote:
> Hi
>
> I have code that does (inside a single transaction)
>
> DROP TABLE IF EXISTS mytable; CREATE TABLE mytable 
>
> Occasionally this produces
>
> ERROR: duplicate key value violates unique constraint
> "pg_type_typname_nsp_index" DETAIL: Key (typname,
> typnamespace)=(mytable, 2200) already exists.
>
> I can get away from this by using CREATE TABLE IF NOT EXISTS in the
> same code, but there's the potential that the wrong data will end up
> in the table if that happens, and it also seems a little odd.
>
> Would you not expect this transaction to be atomic? ie at commit time,
> the transaction should drop any table with the same name that has been
> created by another transaction.

Transactions do not guarantee atomicity in the sense that you mean.
MVCC rules (which DDL generally fall under) try to interleave work as
much as possible which is the problem you're facing.   What you want
is fully serialized creation which can be accomplished with advisory
lock or (better, imo) a leading

LOCK TABLE mytable;

Also, this is not a good pattern.  You ought to be using temp tables
or other mechanics to store transaction local data.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Lock contention in TransactionIdIsInProgress()

2016-10-07 Thread Merlin Moncure

On Wed, Oct 5, 2016 at 5:38 AM, Tomáš Uko  wrote:
> Hi Jeff,
>
>
>
> We have encountered same problem as you (in 9.5.4), it seems that so far it
> hasn’t been ported back from 9.6, but if you take this commit and apply it
> to 9.5 source codes, it seems to be working
>
> But anyway, does anybody knows plans about backporting this to 9.5 official
> way?

Won't happen. Only bugs get ported back, and performance enhancements
generally don't qualify (in very exceptional cases they might but this
issue is a routine performance enhancement).  Basically you have to
decide to update or stay with 9.5.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-06 Thread Merlin Moncure

On Wed, Oct 5, 2016 at 3:27 PM, Stephen Frost  wrote:
> Darren,
>
> * Darren Lafreniere (dlafreni...@onezero.com) wrote:
>> Tom Lane  wrote:
>> > > Gavin Wahl wrote:
>> > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
>> > >> just find the page range with the largest/smallest value, and then only
>> > >> scan that one. Would that be hard to implement? I'm interested in
>> > working
>> > >> on it if someone can give me some pointers.
>> >
>> > I think this proposal is fairly broken anyway.  The page range with the
>> > largest max-value may once have contained the largest live row, but
>> > there's no guarantee that it still does.  It might even be completely
>> > empty.  You could imagine an algorithm like this:
>> >
>> > 1. Find page-range with largest max.  Scan it to identify live row with
>> > largest value.  If *no* live values, find page-range with next largest
>> > max, repeat until no page ranges remain (whereupon return NULL).
>> >
>> > 2. For each remaining page-range whose indexed max exceeds the value
>> > currently in hand, scan that page-range to see if any value exceeds
>> > the one in hand, replacing the value if so.
>> >
>> > This'd probably allow you to omit scanning some of the page-ranges
>> > in the table, but in a lot of cases you'd end up scanning many of them;
>> > and you'd need a lot of working state to remember which ranges you'd
>> > already looked at.  It'd certainly always be a lot more expensive than
>> > answering the same question with a btree index, because in no case do
>> > you get to avoid scanning the entire contents of the index.
> [...]
>> A b-tree index would certainly be faster for ordering. But in scenarios
>> where you have huge datasets that can't afford the space or update time
>> required for b-tree, could such a BRIN-accelerated ordering algorithm at
>> least be faster than ordering with no index?
>
> For at least some of the common BRIN use-cases, where the rows are
> inserted in-order and never/very-rarely modified or deleted, this
> approach would work very well.
>
> Certainly, using this would be much cheaper than a seqscan/top-N sort,
> for small values of 'N', relative to the number of rows in the table,
> in those cases.
>
> In general, I like the idea of supporting this as BRIN indexes strike me
> as very good for very large tables which have highly clumped data in
> them and being able to do a top-N query on those can be very useful at
> times.

Yeah.  If the brin average page overlap and % dead tuple coefficients
are low it absolutely makes sense to drive top N with brin.  It will
never beat a btree but typically brin is used when the btree index is
no good for various reasons.

brin indexes are pretty neat; they can provide stupefying amounts of
optimization in many common warehousing workloads.   They even beat
out index only scans for a tiny fraction of the storage.  Of course,
you have to work around the limitations... :-)

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Predicting query runtime

2016-09-12 Thread Merlin Moncure

On Mon, Sep 12, 2016 at 9:03 AM, Vinicius Segalin  wrote:
> Hi everyone,
>
> I'm trying to find a way to predict query runtime (I don't need to be
> extremely precise). I've been reading some papers about it, and people are
> using machine learning to do so. For the feature vector, they use what the
> DBMS's query planner provide, such as operators and their cost. The thing is
> that I haven't found any work using PostgreSQL, so I'm struggling to adapt
> it.
> My question is if anyone is aware of a work that uses machine learning and
> PostgreSQL to predict query runtime, or maybe some other method to perform
> this.

Well, postgres estimates the query runtime in the form of an expected
'cost', where the cost is an arbitrary measure based on time
complexity of query plan.   It shouldn't be too difficult to correlate
estimated cost to runtime cost.  A statistical analysis of that
correlation would be incredibly useful work although generating sample
datasets would be a major challenge.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Is there a way to fix this ugliness

2016-09-09 Thread Merlin Moncure

On Fri, Sep 9, 2016 at 8:30 AM, Tim Uckun  wrote:
> I am trying to get the child elements of a one to many table to be rolled up
> into a json field in the parent table. The query I am running is
>
> select
> ob.id
>  ,case when array_position(array_agg(im.image_type), null) = 1  then
> '[]' else  json_agg(row_to_json(im.*)) end as images
>  from observations ob
>left join images im on ob.id = im.observation_id
> group by 1
>
>
> The reason I have the case statement there is because some observations
> don't have images but the json_agg(row_to_json function returns [NULL]
> instead of [] which is what I really want.
>
> Is there a more elegant way to do this?

not exactly.  More elegant approaches are frustrated by the lack of a
json operator.  However, you can wrap that in a function.

create or replace function fixnull(j json) returns json as
$$
  select case when j::text = '[null]'::text
  then '[]'::json
  else j
end;
$$ language sql immutable;

select
ob.id,
fixnull(json_agg(to_json(im.*))) as images
 from observations ob
   left join images im on ob.id = im.observation_id
group by 1;

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL Database performance

2016-09-07 Thread Merlin Moncure

On Tue, Sep 6, 2016 at 11:12 PM, Pradeep  wrote:
> Dear Naveed,
>
> I am using PostgreSQL 9.3 version on Windows .After changing these
> parameters, I have not seen any resource management utilization.
>
> I have observed before and after changing  the parameter values ,it is not
> reflecting the memory level. Maximum utilization of RAM is 3GB only.
>
> So kindly let me ,whether it will impact the  RAM utilization or not?

Postgres reserves some memory for itself and relies on the operating
system to buffer the rest.  So this is not really unusual or
interesting.  What would be interesting is specific examples of things
that are not running as fast as you think they should be.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a column definition list is required for functions returning "record"

2016-09-07 Thread Merlin Moncure

 Fri, Sep 2, 2016 at 6:55 PM, Jim Nasby  wrote:
> On 8/29/16 6:28 AM, Tom Lane wrote:
>>
>> Pavel Stehule  writes:
>>>
>>> > 2016-08-29 1:59 GMT+02:00 Jim Nasby :

 >> It would be nice if there was a way to pass dynamically formed
 >> records
 >> around, similar to how you can pass the results of row() around.
 >> Someone
 >> else has actually be asking about this at
 >> https://github.com/decibel/pg_
 >> lambda/issues/1.
>>>
>>> > Probably there is a space to be PLpgSQL more flexible - but there are
>>> > limits - PLpgSQL is black box for SQL engine, and when output is any
>>> > record
>>> > type, then SQL engine knows zero about returning data structure in
>>> > preprocessing time.
>>
>> Exactly.  You can pass anonymous record types around today, as long as you
>> don't do anything that requires knowing what their contents are, either in
>> the function or in the calling query:
>
> What I was thinking of is something (like a function) that has explicitly
> defined what the contents of the record are.

We have that already, it's named 'json_each_text' :-).

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to retrieve jsonb column through JDBC

2016-08-29 Thread Merlin Moncure

On Sat, Aug 27, 2016 at 5:39 AM, Alexander Farber
 wrote:
> Hello,
>
> what do you use to retrieve a jsonb column using JDBC?
>
> I have tried
>
> Object last_tiles = rs.getObject("last_tiles");
>
> and the resulting Object seems to be a String.
>
> Then I have called (using Jetty JSON class here):
>
>  Object last_tiles = JSON.parse(rs.getString("last_tiles"));
>
> And it seems to work ok... but I am still curious, what is the recommended
> (or maybe planned for future) way for retrieving jsonb data in Java.
>
> Also
>
>  List last_tiles = (List) JSON.parse(rs.getString("last_tiles"));
>
> has not work for me even though the string is:
>
>  [{"col": 7, "row": 8, "value": 1, "letter": "A"}, {"col": 7, "row": 7,
> "value": 2, "letter": "B"}, {"col": 7, "row": 9, "value": 2, "letter": "C"}]
>
> but it is probably the problem of the Jetty class I am using and not of
> JDBC...

huh.  what exactly is failing? are you getting a parse exception?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a column definition list is required for functions returning "record"

2016-08-29 Thread Merlin Moncure

On Mon, Aug 29, 2016 at 6:28 AM, Tom Lane  wrote:
> Pavel Stehule  writes:
>> 2016-08-29 1:59 GMT+02:00 Jim Nasby :
>>> It would be nice if there was a way to pass dynamically formed records
>>> around, similar to how you can pass the results of row() around. Someone
>>> else has actually be asking about this at https://github.com/decibel/pg_
>>> lambda/issues/1.
>
>> Probably there is a space to be PLpgSQL more flexible - but there are
>> limits - PLpgSQL is black box for SQL engine, and when output is any record
>> type, then SQL engine knows zero about returning data structure in
>> preprocessing time.
>
> Exactly.  You can pass anonymous record types around today, as long as you
> don't do anything that requires knowing what their contents are, either in
> the function or in the calling query:
>
> regression=# create function foor(int,int) returns record language sql as $$ 
> select row($1,$2); $$;
> CREATE FUNCTION
> regression=# select foor(23,45);
>   foor
> -
>  (23,45)
> (1 row)
>
> regression=# create function plr(int,int) returns record language plpgsql as 
> $$begin return row($1,$2); end; $$;
> CREATE FUNCTION
> regression=# select plr(23,45);
>plr
> -
>  (23,45)
> (1 row)
>
> What you can't do is, eg,
>
> regression=# select * from plr(23,45);
> ERROR:  a column definition list is required for functions returning "record"
> LINE 1: select * from plr(23,45);

Another tactic is to use json in such cases:
create function fooj(int,int) returns json language sql as $$ select
to_json((select q from (select $1 as a, $2 as b) q)); $$;
CREATE FUNCTION
postgres=# select fooj(3,4);
 fooj
───
 {"a":3,"b":4}
(1 row)

the advantage here is that you're not quite as boxed in: stuff like
postgres=# select * from json_each(fooj(3,4));
 key │ value
─┼───
 a   │ 3
 b   │ 4

... in the json api (which is rich and getting richer) can be used to
work around the anonymous row limitations.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a column definition list is required for functions returning "record"

2016-08-26 Thread Merlin Moncure

On Fri, Aug 26, 2016 at 10:50 AM, Cachique  wrote:
> Hi
> From the documentation... (
> https://www.postgresql.org/docs/current/static/sql-select.html )
>
> 'Function calls can appear in the FROM clause. (This is especially useful
> for functions that return result sets, but any function can be used.) This
> acts as though the function's output were created as a temporary table for
> the duration of this single SELECT command...
> If the function has been defined as returning the record data type, then an
> alias or the key word AS must be present, followed by a column definition
> list in the form ( column_name data_type [, ... ]). The column definition
> list must match the actual number and types of columns returned by the
> function.'
>
>
> You need to use 'returns table' syntax or to add an alias in your query.
> Something like
> select * from words_select_games(1) as (gid type, created type, player1
> type, ...);
>
> Check for the correct column types

In this case it's probably better to have the function return a table
type, RETURNS table, or define the output with OUT variables.  The
record defining syntax at time of query:

SELECT foo() AS (a int, b text);

Typically should only be used when the function does not have a well
defined return structure.  dlbink is a good example, since the
structure of the input query directly controls the stucture of the
return type.  The sever has no way to deal with that when the query is
parsed and planned, so you have to help it out.

merlin

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Merlin Moncure

On Fri, Aug 19, 2016 at 2:32 AM, Thomas Güttler
 wrote:
> I want to store logs in a simple table.
>
> Here my columns:
>
>   Primary-key (auto generated)
>   timestamp
>   host
>   service-on-host
>   loglevel
>   msg
>   json (optional)
>
> I am unsure which DB to choose: Postgres, ElasticSearch or ...?

We use SOLR (which is similar to ElasticSearch) here for json document
retrieval.  Agreeing to do this was one of the biggest mistakes in my
professional career.  This choice was somewhat forced because at the
time jsonb was not baked.  In my opinion, jsonb outclasses these types
of services particularly if you are already invested in postgres.  The
specifics of your requirements also plays into this decision
naturally.  The bottom line though is that these kinds of systems are
not nearly as fast or robust as they claim to be particularly if you
wander off the use cases they are engineered for (like needing
transactions or joins for example).  They also tend to be fairly
opaque in how they operate and the supporting tooling is laughable
relative to established database systems.

Postgres OTOH can be made to do pretty much anything given sufficient
expertise and a progressive attitude.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Jsonb extraction very slow

2016-08-16 Thread Merlin Moncure

On Thu, Aug 11, 2016 at 8:45 AM, Tom Lane  wrote:
> Jim Nasby  writes:
>> I never dug into why. As Tom posited, decompression might explain the
>> time to get a single key out. Getting 10 keys instead of just 1 wasn't
>> 10x more expensive, but it was significantly more expensive than just
>> getting a single key.
>
> What were you doing to "get ten keys out"?  If those were ten separate
> JSON operators, they'd likely have done ten separate decompressions.
> You'd have saved something by having the TOAST data already fetched into
> shared buffers, but it'd still hardly be free.

Huh -- FWICT there is no way to pull N values from a jsonb with # of
items M for any value of N other than 1 or M with a single operation.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Any reasons for 'DO' statement not returning result?

2016-08-16 Thread Merlin Moncure

On Sun, Aug 14, 2016 at 5:58 AM, Chris Travers <chris.trav...@gmail.com> wrote:
>
> On Sun, Aug 14, 2016 at 12:13 PM, Xtra Coder <xtraco...@gmail.com> wrote:
>>
>> Thanks, I'm aware about ability to create temp functions, but this is
>> actually too much overhead - I mean unneeded boilerplate code, but it seems
>> in current state it is "the least evil" which I have to use.
>>
>> I think 'what i need' is support for following
>> - ability to switch session language from 'sql' to 'pl/pgsql'
>> - in that mode - ability to declare session-scope variables, 'DO' is just
>> not needed after that
>> - SELECTs not targeted into a variable - are written to client output
>> - (C) Merlin Moncure - "Ability to embed collection of statements in the
>> database under a name and invoke those statements via CALL , which
>> does not automatically create a transaction and a snapshot (unlike
>> functions/DO)"
>>
>> All this seems to be a huge change which will definitely not appear any
>> time soon.
>
> I am willing to bet that DO $$ $$; blocks are neither planned nor
> parameterized.  And the planner needs to know what is to be returned.

The statements within a do block are absolutely planned and
parameterized.  There was some recent discussion with respect to not
planning statements except under certain conditions (in a loop
basically) to reduce memory consumption of long 'do' blocks.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Any reasons for 'DO' statement not returning result?

2016-08-11 Thread Merlin Moncure

On Mon, Aug 8, 2016 at 7:25 PM, Xtra Coder  wrote:
> Hi,
>
> I'm just curious about the reasons of the design of 'DO' statement so that
> it is not able to return result of the SELECT in its body.
>
> References:
> https://www.postgresql.org/docs/current/static/sql-do.html
>
> http://stackoverflow.com/questions/14652477/how-to-perform-a-select-query-in-a-do-block
>
> With some former experience with MsSQL server, where 'complex' script is
> executed easily and straightforward without any 'wrapping', like this
> dummy-one ...
>
> DECLARE @a int;
> DECLARE @b int;
> ...
> select @a + @b as "a+b"
>
> ... every time I need to execute some one-time-through-away complex code in
> PostgreSQL which returns rowset I'm disappointed - this has to be wrapped
> into normal 'temp' function which I have to delete all the time in current
> session, thus making an anonymous 'DO' statement use-less in 95% of my
> use-cases.
>
> So ... may someone know good reasons for such inconvenient design of 'DO'
> statement?

IIRC past discussion concluded DO statements should be allowed to
return values.

What you (or at least I-) really want though is stored procedures.  To
me, this means the following:

*) Ability to embed collection of statements in the database under a name
*) Ability to invoke those statements via CALL , which does not
automatically create a transaction and a snapshot (unlike
functions/DO)

I used to think that we needed to pick a procedural language (for
example, pl/pgsql) to leverage the various programming niceties of the
database (such as variables and flow control).  Today I'm thinking it
ought to be vanilla SQL for starters, with some judicious SQL
extensions to be hashed out later.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pasting a lot of commands to psql

2016-07-07 Thread Merlin Moncure

On Thu, Jul 7, 2016 at 2:48 PM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> Merlin Moncure wrote:
>> On Thu, Jul 7, 2016 at 1:18 PM, Alvaro Herrera <alvhe...@2ndquadrant.com> 
>> wrote:
>> > Tom Lane wrote:
>> >
>> >> You might have better luck with "psql -n", or maybe not.
>> >
>> > I've wished sometimes for a "\set READLINE off" psql metacommand for
>> > this kind of thing.  It's pretty annoying when the text being pasted
>> > contains tabs and readline uses to do completion.
>>
>> Agreed.  I've looked at this problem extensively and concur that
>> readline is the culprit; I don't think there's any solution on our end
>> besides filing a bug with the readline.  I also agree with the
>> upthread suggestion that the best workaround today is to \e into a
>> non-readline based editor (vim qualifies).  Having said that, at least
>> on linux/gnome, very long pastes can cause severe performance issues
>> as well.  So for large pastes I go with psql -f.
>
> Hmm, I was doing megabyte-long pastes (longest one over 5 MB) just a few
> weeks ago and was pleasantly surprised to discover that they worked just
> fine with no noticeable performance problem.  I was pasting skype logs
> directly from the Linux skype client window into an xterm running cat,
> with obviously no readline involved.

Might be a 'xterm vs Mate Terminal' problem.  Using raw xterm
performance is great.  I like some of the creature comforts of the
mate terminal though.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pasting a lot of commands to psql

2016-07-07 Thread Merlin Moncure

On Thu, Jul 7, 2016 at 1:18 PM, Alvaro Herrera  wrote:
> Tom Lane wrote:
>
>> You might have better luck with "psql -n", or maybe not.
>
> I've wished sometimes for a "\set READLINE off" psql metacommand for
> this kind of thing.  It's pretty annoying when the text being pasted
> contains tabs and readline uses to do completion.

Agreed.  I've looked at this problem extensively and concur that
readline is the culprit; I don't think there's any solution on our end
besides filing a bug with the readline.  I also agree with the
upthread suggestion that the best workaround today is to \e into a
non-readline based editor (vim qualifies).  Having said that, at least
on linux/gnome, very long pastes can cause severe performance issues
as well.  So for large pastes I go with psql -f.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2101 matches

Mail list logo