Re: [GENERAL] a JOIN to a VIEW seems slow

2017-09-17 Thread Frank Millman

2017-09-14 15:09 GMT+02:00 Pavel Stehule :



  2017-09-14 14:59 GMT+02:00 Frank Millman :

Pavel Stehule wrote:

2017-09-14 10:14 GMT+02:00 Frank Millman :

  Hi all

  This is a follow-up to a recent question I posted regarding a slow query. 
I thought that the slowness was caused by the number of JOINs in the query, but 
with your assistance I have found the true reason. I said in the previous 
thread that the question had become academic, but now that I understand things 
better, it is no longer academic as it casts doubt on my whole approach.

  I have split my AR transaction table into three physical tables – 
ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some 
point, such as ar_tran_jnl.

  I then create a VIEW to view all transactions combined. The view is 
created like this -

  CREATE VIEW ar_trans AS
SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... 
FROM ar_tran_inv WHERE posted = ‘1’
UNION ALL
SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... 
FROM ar_tran_crn WHERE posted = ‘1’
UNION ALL
SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... 
FROM ar_tran_rec WHERE posted = ‘1’

  I have another table called ‘ar_trans_due’, to keep track of outstanding 
transactions. All of the three transaction types generate entries into this 
table. To identify the source of the transaction, I have created columns in 
ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row into 
‘ar_tran_inv’, I invoke this -

INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES 
(‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction 
types. It is handled by a Python program, and it all happens within a 
transaction.

  When I view a row in ar_trans_due, I want to retrieve data from the 
source transaction, so I have this -

SELECT * FROM ar_trans_due a
LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = 
a.tran_row_id

  I understand that PostgreSQL must somehow follow a path from the view 
‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would execute 
the equivalent of SELECT * FROM ar_tran_inv WHERE row_id = a.tran_row_id AND 
posted = ‘1’.

  If this was the case, it would be an indexed read, and very fast. 
Instead, according to EXPLAIN, it performs a sequential scan of the 
‘ar_tran_inv’ table.

  It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it 
uses a Bitmap Heap Scan on those. I assume that is because the tables are 
currently empty.

  Is this analysis correct?

please, send EXPLAIN ANALYZE result :) 


I tried to reduce this to its simplest form.

Here is a SQL statement -

SELECT *
FROM ccc.ar_trans_due a
LEFT JOIN ccc.ar_trans b ON
b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
WHERE a.row_id = 1

ar_trans_due is a physical table, ar_trans is a view.

It takes about 28ms. Here is the explain - https://explain.depesz.com/s/8YY


> The PostgreSQL cannot to push join - in slow case, the UNIONS should be done 
> first - and it requires full scan ar_tran_inv - used filter (posted AND 
> (deleted_id = 0) is not too effective - maybe some composite or partial index 
> helps.
> 
> The fast query doesn't contains unions - so there are bigger space for 
> optimizer - ar_tran_inv is filtered effective - by primary key. 
> 
> So main problem is impossible to push information a.row_id = 1 to deep to 
> query.
> 

Sorry for banging on about this, but someone might be interested in the 
following timings.

The only solution I could find was to ‘denormalise’ (if that is a word) and 
create additional columns on ar_trans_due for cust_row_id and tran_date, to 
avoid using any joins.

Once I had done that, I could run my query two ways – 
  1. using the newly created columns
  2. as before, using a join to the view, which in turn retrieved data from the 
underlying tables.

This was a more complex query than the example above – details available on 
request.

Here are the timings for running the query on identical data sets using 
Postgresql, Sql Server, and Sqlite3 -

PostgreSQL -
Method 1 - 0.28 sec
Method 2 – 1607 sec, or 26 minutes

Sql Server -
Method 1 – 0.33 sec
Method 2 – 1.8 sec

Sqlite3 -
Method 1 – 0.15 sec
Method 2 – 1.0 sec

It seems that Sql Server and Sqlite3 are able to analyse the ‘join’, and 
execute an indexed read against the underlying physical tables.

Frank


[GENERAL] pg_rewind issue

2017-09-17 Thread James Sewell
Hi All,

pg_rewind is failing to bring an old master back into a workable state. I'm
wondering if I'm missing a step?

EDB 9.6.4 using pg_backrest for archive management.

High write rate throughout this process.

Steps to cause:


   - Stop old master
   - Promote old standby to new master
   - Force checkpoint on new master
   - Run pg_rewind (1) on old master
   - Start old master with correct recovery.conf


After all archives are pulled from the new master seeing this in the
standby logs:

2017-09-07 14:59:40 AEST [117796]: [3-1] user=,db=,client=  (0:0)LOG:
 entering standby mode
2017-09-07 14:59:41 AEST [117796]: [4-1] user=,db=,client=  (0:0)LOG:
 restored log file "0015.history" from archive
2017-09-07 14:59:42 AEST [117796]: [5-1] user=,db=,client=  (0:0)LOG:
 restored log file "00150C740044" from archive
2017-09-07 14:59:42 AEST [117796]: [6-1] user=,db=,client=  (0:0)LOG:
 unexpected pageaddr C6F/C800 in log segment 00150C740044,
offset 0
2017-09-07 14:59:42 AEST [118045]: [1-1] user=,db=,client=  (0:0)LOG:
 started streaming WAL from primary at C73/A900 on timeline 21
2017-09-07 14:59:42 AEST [118045]: [2-1] user=,db=,client=  (0:XX000)FATAL:
 could not receive data from WAL stream: ERROR:  requested WAL segment
00150C7300A9 has already been removed


When Iook at timeline 15 I see this:

20  C74/4500no recovery target specified+

As 00150C7300A9 > 00150C740044  I'm confused why
this WAL would be required? It's never been created on either server
(although the same suffix does exist in timeline 14).

Cheers,
James Sewell

(1) /usr/edb/as9.6/bin/pg_rewind --target-pgdata=/ppas/9.6/data/pg_data
--source-server="host =10.154.19.24 port=5432 user=postgres
dbname=postgres"

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [GENERAL] Selecting a daily puzzle record - which type of column to add?

2017-09-17 Thread David G. Johnston
On Sun, Sep 17, 2017 at 1:13 PM, Alexander Farber <
alexander.far...@gmail.com> wrote:

>
> I need to add a useful column, so that it would be easy to me to create a
> web script which would display today's and all past "daily puzzle" records
> - and wouldn't change the already published puzzles...
>
>
​Serial (i.e. integer/bigint with an sequence generator).  I would then add
another table that simply stores the "last manually reviewed id" as its
only record (or you could record manual reviews and take the max of that
field from the table).

David J.


Re: [GENERAL] looking for a globally unique row ID

2017-09-17 Thread Gmail
> 
> As you may have noticed, I've put significant effort to focus the
> discussion on my actual question: the "global index" (which btw I didn't
> know is called this way here - if I new, I'd probably could have google
> it instead). This was intentional. I like my schema design very much and
> I'm unwilling to part with it.
> 
> 
> no, it doesn't.
> 
> T1 is empty. It's just a head of inheritance tree.
> 
> There is no guarantee (index on T1 will have no entries). But naturally
> there are ways to "smartly" partition the ID space allocated to
> subtables of T1.
> 
> 
OK. Wow, that's sure not how I read the docs on inheritance, but I've never 
used the construct thinking it was largely syntactic sugar on master/detail 
based scheme designs.

But since you're wed irrevocably to your scheme design, I'll bow out of this 
discussion.  
All the best,
rjs




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] advisory locks namespace?

2017-09-17 Thread John R Pierce

On 9/17/2017 1:33 PM, Rob Nikander wrote:

Am I right if two applications use advisory locks in the same database, they 
need to know which lock numbers are used to by the other application, to avoid 
conflicts?



indeed.   it also depends if they want to honor each others locks.

--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] advisory locks namespace?

2017-09-17 Thread Rob Nikander
Hi,

Am I right if two applications use advisory locks in the same database, they 
need to know which lock numbers are used to by the other application, to avoid 
conflicts? 

Rob 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Selecting a daily puzzle record - which type of column to add?

2017-09-17 Thread Alexander Farber
Good evening,

In a word game I store all player moves in the table:

CREATE TYPE words_action AS ENUM (
'play',
'skip',
'swap',
'resign',
'ban',
'expire'
);

CREATE TABLE words_moves (
mid BIGSERIAL PRIMARY KEY,
action  words_action NOT NULL,
gid integer NOT NULL REFERENCES words_games ON DELETE CASCADE,
uid integer NOT NULL REFERENCES words_users ON DELETE CASCADE,
played  timestamptz NOT NULL,
tiles   jsonb,
score   integer CHECK(score >= 0)
);

I could run a cron job on all moves played each day and select the
"spectacular" ones by it, i.e. when a very big score has been achieved in
the move or all 7 tiles have been played...

Then I (as admin of the game) would manually review the daily mails sent by
that cronjob and select the few I have found interesting - for later
publishing them as "daily puzzle" in my day.

However I don't want to do the reviewing every day as that would be
tedious, but more like once per week and then select several such moves at
once (for the future dates).

My question (and thanks for reading my mail sofar) is: which column would
you add to the words_moves table for such a purpose?

If I add a boolean (is a daily puzzle move: true or false) - then it is
difficult to keep the order of the daily puzzles, I think.

If I add a timestamptz, then to which date to set it, when I do my manual
review once a week?

I need to add a useful column, so that it would be easy to me to create a
web script which would display today's and all past "daily puzzle" records
- and wouldn't change the already published puzzles...

If you have a good idea here, please share with me. If not, sorry for the
maybe offtopic question.

Thanks
Alex


Re: [GENERAL] Remove useless joins (VARCHAR vs TEXT)

2017-09-17 Thread Tom Lane
David Rowley  writes:
> On 17 September 2017 at 08:07, Kim Rose Carlsen  wrote:
>> It seems there are some difference in VARCHAR vs TEXT when postgres tries to
>> decide if a LEFT JOIN is useful or not.

> Yeah, it looks like the code to check for distinctness in the subquery
> fails to consider that the join condition may contain RelabelTypes
> instead of plain Vars.
> 
> The attached fixes.

Looks like a good fix to me (except for the copied-and-pasted,
not-quite-on-point comment ;-)).  Pushed.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a globally unique row ID

2017-09-17 Thread Rafal Pietrak


W dniu 17.09.2017 o 03:30, Gmail pisze:
> 
> 
>> On Sep 16, 2017, at 11:18 AM, Rafal Pietrak  wrote:
>>
>> Dear robjsarg...@gmail.com,
>>
>> W dniu 16.09.2017 o 17:19, Gmail pisze:
>>>
>>>
 On Sep 16, 2017, at 8:23 AM, Rafal Pietrak  wrote:

[-]
>> PK in those 12 "class-A" tables must be accessed separately. And those I
>> have. It goes like this:
>>
>> CREATE TABLE T1 (id int, b date);-- level 1
>> CREATE TABLE T2 (c text) INHERITS (T1);
>> CREATE TABLE T3 (d text) INHERITS (T1);
>> CREATE TABLE T4 (e text, tm date) INHERITS (T1);
>> CREATE TABLE T5 (f text) INHERITS (T1); -- level 2
>> CREATE TABLE T6 (ca text) INHERITS (T2);
>> CREATE TABLE T7 (db text, db2 text) INHERITS (T3);
>> CREATE TABLE T8 (ec text, db4 test) INHERITS (T4);
>> CREATE TABLE T9 (fd text) INHERITS (T5); -- level 3
>> CREATE TABLE T10 (db11 text) INHERITS (T7); -- level 4
>>
> - quick question: do you have more than one table inheriting from T2?

Pls pls pls. Don't. This leads nowhere.

What if I say "yes" ... but for the same question regarding T3 you would
get: "no".

This is just and indication how incorrect your "counting at home" is.

> 
> We all skin cats different ways, of course.
> Often with such things, especially with ORM tools, one puts a "type"  field 
> in the master table indicating which exact implementation if referenced for 
> each row. Helps in reporting counts as well.   I assume the columns "c text" 
> are just placeholders for discussion, not the actual structure.   

YES!

>> ... still counting? And I haven't yet touch any of the 12 leaf tables
>> I'm using.
>>
>>> 2 You don't have enough document instances to warrant partitioning
>>
>> I don't. My couple of thousands of documents is just nothing. I don't
>> have "enough documents" (that's why I say I can live with a performance
>> hit). true, true, true.
>>
>> But I do have enough document-content variety for that. I could collapse
>> some of the hierarchy at the expense of some columns getting NULL for
>> certain rows - but that's just nonesens. I'm not doing that.
> We pick our poisons, I guess.  I sure might be tempted to denormalize some of 
> those if it made life much easier/faster.  Hard to say from what's been 
> presented - still assuming we're seeing pseudo-tables.

I've been there ... and back. If I may, I'd like to avoid any discussion
of whether my schema is sound or it stinks.

As you may have noticed, I've put significant effort to focus the
discussion on my actual question: the "global index" (which btw I didn't
know is called this way here - if I new, I'd probably could have google
it instead). This was intentional. I like my schema design very much and
I'm unwilling to part with it.

>>
>>> 3 Your remaining problem is your workflow
>>
>> Sorry I don't understand this point.
> 
> I meant that your document tables are fine (as described above) and that you 
> were still having trouble with the persistence aspects of what happens to the 
> documents.

Hmmm... still not so clear. But I'll try:

1. I don't really have problems with "persistence aspects ... of what
happens". This I've implemented by copying the inheritance structure of
"class-A" tables into "class-B" tables. "persistence" is kept in order
by numerous FKs.

2. I do have a problem, that this results in my schema "being larger
then expected" ... but that's just a nuisance - the developer of new
feature (e.i myself) has more headaches then it should.

If that's what you asked for, so the answer is: "No I don't have a
problem with my workflow".

>>
>>> 4 You have an academic interest in multi-table indexing
>>>
>>
>> Yes. so what?
>>
> Nothing.  Just trying to see if I'm following your thread.
>> As nobody have tried to advise me to change my schema, from my point of
>> view the discussion goes just fine. And I've got some very interesting
>> answers. Is there a problem with that?
>>
>> What exactly are you trying to say?
>> -R
>>
> All your documents are represented in your "T1" table.  So your processing 
> can always refer to that table - which is excellent.  T1 guarantees unique 
> ids across all other T-tables.  Activity related records have no 

;7

no, it doesn't.

T1 is empty. It's just a head of inheritance tree.

There is no guarantee (index on T1 will have no entries). But naturally
there are ways to "smartly" partition the ID space allocated to
subtables of T1.

Only I was asking for database tools to GUARANTEE that. And across of
future UPDATEs too (not just INSERTS).

need to be under the same unique ID space (though personally I'm a fan
of UUID anyway).  I'm not seeing where you would benefit from the title
of this thread.

To some extend, you are right here.

And missing the "global unique ID" (as per subject of the posted
question) I'm currently using "a sort of" this king of workaround. And
in fact one of the responses in this thread showed me a better way to
implement that workaround - this is really, 

Re: [GENERAL] looking for a globally unique row ID

2017-09-17 Thread Rafal Pietrak


W dniu 16.09.2017 o 22:20, rob stone pisze:
> 
> 
> On Sat, 2017-09-16 at 19:18 +0200, Rafal Pietrak wrote:
>> Dear robjsarg...@gmail.com,
[-]
>>
>>
> Hello Rafal,

Hi,

> 
> 
> I've been trying to follow this discussion but now I'm totally
> confused. (Some people might say that this is my normal state.)
> 
> However, what do you mean by the following:-
> 
> 17 "process tables"?
> 
> multiplicated -- does this mean replicated?

To pinpoint the schema design I'm reducing the example complexity just
to  one level of hierarchy in "class-A", and just to one table in
"class-B". It goes like this:

CREATE TABLE T1 (id int, tm timestamp)
CREATE TABLE T2 (me int, him int) INHERITS (T1) -- level 1
CREATE TABLE T3 (me int, him int) INHERITS (T1)

ALTER TABLE T2 ADD CONSTRAINT t1_pk PRIMARY KEY(me,id)
ALTER TABLe T3 ACC CONSTRAINT t1_pk PRIMARY KEY(him,id)

the "logic" of processing requires here just one class-B table:
CREATE TABLE BP (id int references t1(id), info text)

but i have to create two tables instead:
CREATE TABLE BP2 (id int, me int, him int, info text, FOREIGN KEY
(id,me) references t2(id,me));
CREATE TABLE BP2 (id int, me int, him int, info text, FOREIGN KEY
(id,him) references t3(id,him));

That's the multiplication.

Every class-B table has to have the same inheritance structure as
class-A tables (frankly, not all but actually only the most, but that's
not really important).

> 
> any of the 12 leaf tables I'm using  -- what is a "leaf" table?
> 
> collapse some of the hierarchy at the expense of some columns getting
> NULL for certain rows  --  does this mean if you have two input fields
> (field A and field B) that if field A is not null and field B is null
> the data is inserted into one table and if it's the inverse you insert
> into an entirely different table?

I had this exact impression some time ago, so I've rewritten my schema
to "flat table". This didn't work. I've lost some of the constraints I
use to keep the data in order.

So I've rewritten the schema back to the inheritance hierarchy. And it
wasn't just a rollback, as the application lived with the new layout for
something over half a year and new things got implemented along the road.

Both rewrites costed me like 2 month *intense* work each. I've learned,
that current layout suits its usage best. The "features not covered
properly" are now reduced to just this single "global index problem".
And this problem is just "slower development" - I'm reluctant to put new
features, because the schema is so unnecessarily complex now. Well, be it.

> 
> 
> 
> IMHO, you need an UML diagram that not only sets out your workflow but
> will also provide the basis for your schema.
> Keep in mind that a hierarchy can be 'n' tables deep. The foreign keys
> point back upwards until you finally reach the parent.

No, really, pls don't.

I've spend A LOT of time figuring out (and numerous times taking the
wrong turns) of the optimal schema.

This is not helping.

> 
> You mention payments being made. Users make mistakes. They can post a
> payment to the wrong account and later it has to be reversed. These
> things can be modelled via your UML diagram.
> 

This kind of events just go to history table.

Meaning FK in payment table get updated, while the old value is
log-registered in history table ... that log-table does not have any FK
constraints, as it's ment to be just an audit trail.

Really. All this works just fine.

-R


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general